Postnatal environmental exposures, particularly those found in household products and dietary intake, along with specific serum metabolomics profiles, are significantly associated with the BMI Z-score of children aged 6-11 years. Higher concentrations of certain metabolites in serum, reflecting exposure to chemical classes or metals, will correlate with variations in BMI Z-score, controlling for age and other relevant covariates. Some metabolites associated with chemical exposures and dietary patterns can serve as biomarkers for the risk of developing obesity.
Research indicates that postnatal exposure to endocrine-disrupting chemicals (EDCs) such as phthalates, bisphenol A (BPA), and polychlorinated biphenyls (PCBs) can significantly influence body weight and metabolic health (Junge et al., 2018). These chemicals, commonly found in household products and absorbed through dietary intake, are linked to detrimental effects on body weight and metabolic health in children. This hormonal interference can lead to an increased body mass index (BMI) in children, suggesting a potential pathway through which exposure to these chemicals contributes to the development of obesity.
A longitudinal study on Japanese children examined the impact of postnatal exposure (first two years of life) to p,p’-dichlorodiphenyltrichloroethane (p,p’-DDT) and p,p’-dichlorodiphenyldichloroethylene (p,p’-DDE) through breastfeeding (Plouffe et al., 2020). The findings revealed that higher levels of these chemicals in breast milk were associated with increased BMI at 42 months of age. DDT and DDE may interfere with hormonal pathways related to growth and development. These chemicals can mimic or disrupt hormones that regulate metabolism and fat accumulation. This study highlights the importance of understanding how persistent organic pollutants can affect early childhood growth and development.
The study by Harley et al. (2013) investigates the association between prenatal and postnatal Bisphenol A (BPA) exposure and various body composition metrics in children aged 9 years from the CHAMACOS cohort. The study found that higher prenatal BPA exposure was linked to a decrease in BMI and body fat percentages in girls but not boys, suggesting sex-specific effects. Conversely, BPA levels measured at age 9 were positively associated with increased adiposity in both genders, highlighting the different impacts of exposure timing on childhood development.
The 2022 study 2022 study by Uldbjerg et al. explored the effects of combined exposures to multiple EDCs, suggesting that mixtures of these chemicals can have additive or synergistic effects on BMI and obesity risk. Humans are typically exposed to a mixture of chemicals rather than individual EDCs, making it crucial to understand how these mixtures might interact. The research highlighted that the interaction between different EDCs can lead to additive (where the effects simply add up) or even synergistic (where the combined effect is greater than the sum of their separate effects) outcomes. These interactions can significantly amplify the risk factors associated with obesity and metabolic disorders in children. The dose-response relationship found that even low-level exposure to multiple EDCs could result in significant health impacts due to their combined effects.
These studies collectively illustrate the critical role of environmental EDCs in shaping metabolic health outcomes in children, highlighting the necessity for ongoing research and policy intervention to mitigate these risks.
This study will utilize data from the subcohort of 1301 mother-child pairs in the HELIX study, who are which aged 6-11 years for whom complete exposure and outcome data were available. Exposure data included detailed dietary records after pregnancy and concentrations of various chemicals like BPA and PCBs in child blood samples. There are categorical and numerical variables, which will include both demographic details and biochemical measurements. This dataset allows for robust statistical analysis to identify potential associations between EDC exposure and changes in BMI Z-scores, considering confounding factors such as age, gender, and socioeconomic status. There are no missing data so there is not need to impute the information. Child BMI Z-scores were calculated based on WHO growth standards.
load("/Users/allison/Library/CloudStorage/GoogleDrive-aflouie@usc.edu/My Drive/HELIX_data/HELIX.RData")
filtered_chem_diet <- codebook %>%
filter(domain %in% c("Chemicals", "Lifestyles") & period == "Postnatal" & subfamily != "Allergens")
# specific covariates
filtered_covariates <- codebook %>%
filter(domain == "Covariates" &
variable_name %in% c("ID", "e3_sex_None", "e3_yearbir_None", "h_edumc_None", "h_cohort", "hs_child_age_None"))
#specific phenotype variables
filtered_phenotype <- codebook %>%
filter(domain == "Phenotype" &
variable_name %in% c("hs_zbmi_who"))
# combining all necessary variables together
combined_codebook <- bind_rows(filtered_chem_diet, filtered_covariates, filtered_phenotype)
kable(combined_codebook, align = "c", format = "html") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F)
| variable_name | domain | family | subfamily | period | location | period_postnatal | description | var_type | transformation | labels | labelsshort | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| h_bfdur_Ter | h_bfdur_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Breastfeeding duration (weeks) | factor | Tertiles | Breastfeeding | Breastfeeding |
| hs_bakery_prod_Ter | hs_bakery_prod_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: bakery products (hs_cookies + hs_pastries) | factor | Tertiles | Bakery prod | BakeProd |
| hs_beverages_Ter | hs_beverages_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: beverages (hs_dietsoda+hs_soda) | factor | Tertiles | Soda | Soda |
| hs_break_cer_Ter | hs_break_cer_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: breakfast cereal (hs_sugarcer+hs_othcer) | factor | Tertiles | BF cereals | BFcereals |
| hs_caff_drink_Ter | hs_caff_drink_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Drinks a caffeinated or æenergy drink (eg coca-cola, diet-coke, redbull) | factor | Tertiles | Caffeine | Caffeine |
| hs_dairy_Ter | hs_dairy_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: dairy (hs_cheese + hs_milk + hs_yogurt+ hs_probiotic+ hs_desert) | factor | Tertiles | Dairy | Dairy |
| hs_fastfood_Ter | hs_fastfood_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Visits a fast food restaurant/take away | factor | Tertiles | Fastfood | Fastfood |
| hs_KIDMED_None | hs_KIDMED_None | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Sum of KIDMED indices, without index9 | numeric | None | KIDMED | KIDMED |
| hs_mvpa_prd_alt_None | hs_mvpa_prd_alt_None | Lifestyles | Lifestyle | Physical activity | Postnatal | NA | NA | Clean & Over-reporting of Moderate-to-Vigorous Physical Activity (min/day) | numeric | None | PA | PA |
| hs_org_food_Ter | hs_org_food_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Eats organic food | factor | Tertiles | Organicfood | Organicfood |
| hs_proc_meat_Ter | hs_proc_meat_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: processed meat (hs_coldmeat+hs_ham) | factor | Tertiles | Processed meat | ProcMeat |
| hs_readymade_Ter | hs_readymade_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Eats a æready-made supermarket meal | factor | Tertiles | Ready made food | ReadyFood |
| hs_sd_wk_None | hs_sd_wk_None | Lifestyles | Lifestyle | Physical activity | Postnatal | NA | NA | sedentary behaviour (min/day) | numeric | None | Sedentary | Sedentary |
| hs_total_bread_Ter | hs_total_bread_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: bread (hs_darkbread+hs_whbread) | factor | Tertiles | Bread | Bread |
| hs_total_cereal_Ter | hs_total_cereal_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: cereal (hs_darkbread + hs_whbread + hs_rice_pasta + hs_sugarcer + hs_othcer + hs_rusks) | factor | Tertiles | Cereals | Cereals |
| hs_total_fish_Ter | hs_total_fish_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: fish and seafood (hs_canfish+hs_oilyfish+hs_whfish+hs_seafood) | factor | Tertiles | Fish | Fish |
| hs_total_fruits_Ter | hs_total_fruits_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: fruits (hs_canfruit+hs_dryfruit+hs_freshjuice+hs_fruits) | factor | Tertiles | Fruits | Fruits |
| hs_total_lipids_Ter | hs_total_lipids_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: Added fat | factor | Tertiles | Diet fat | Diet fat |
| hs_total_meat_Ter | hs_total_meat_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: meat (hs_coldmeat+hs_ham+hs_poultry+hs_redmeat) | factor | Tertiles | Meat | Meat |
| hs_total_potatoes_Ter | hs_total_potatoes_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: potatoes (hs_frenchfries+hs_potatoes) | factor | Tertiles | Potatoes | Potatoes |
| hs_total_sweets_Ter | hs_total_sweets_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: sweets (hs_choco + hs_sweets + hs_sugar) | factor | Tertiles | Sweets | Sweets |
| hs_total_veg_Ter | hs_total_veg_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: vegetables (hs_cookveg+hs_rawveg) | factor | Tertiles | Vegetables | Vegetables |
| hs_total_yog_Ter | hs_total_yog_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: yogurt (hs_yogurt+hs_probiotic) | factor | Tertiles | Yogurt | Yogurt |
| hs_dif_hours_total_None | hs_dif_hours_total_None | Lifestyles | Lifestyle | Sleep | Postnatal | NA | NA | Total hours of sleep (mean weekdays and night) | numeric | None | Sleep | Sleep |
| hs_as_c_Log2 | hs_as_c_Log2 | Chemicals | Metals | As | Postnatal | NA | NA | Arsenic (As) in child | numeric | Logarithm base 2 | As | As |
| hs_cd_c_Log2 | hs_cd_c_Log2 | Chemicals | Metals | Cd | Postnatal | NA | NA | Cadmium (Cd) in child | numeric | Logarithm base 2 | Cd | Cd |
| hs_co_c_Log2 | hs_co_c_Log2 | Chemicals | Metals | Co | Postnatal | NA | NA | Cobalt (Co) in child | numeric | Logarithm base 2 | Co | Co |
| hs_cs_c_Log2 | hs_cs_c_Log2 | Chemicals | Metals | Cs | Postnatal | NA | NA | Caesium (Cs) in child | numeric | Logarithm base 2 | Cs | Cs |
| hs_cu_c_Log2 | hs_cu_c_Log2 | Chemicals | Metals | Cu | Postnatal | NA | NA | Copper (Cu) in child | numeric | Logarithm base 2 | Cu | Cu |
| hs_hg_c_Log2 | hs_hg_c_Log2 | Chemicals | Metals | Hg | Postnatal | NA | NA | Mercury (Hg) in child | numeric | Logarithm base 2 | Hg | Hg |
| hs_mn_c_Log2 | hs_mn_c_Log2 | Chemicals | Metals | Mn | Postnatal | NA | NA | Manganese (Mn) in child | numeric | Logarithm base 2 | Mn | Mn |
| hs_mo_c_Log2 | hs_mo_c_Log2 | Chemicals | Metals | Mo | Postnatal | NA | NA | Molybdenum (Mo) in child | numeric | Logarithm base 2 | Mo | Mo |
| hs_pb_c_Log2 | hs_pb_c_Log2 | Chemicals | Metals | Pb | Postnatal | NA | NA | Lead (Pb) in child | numeric | Logarithm base 2 | Pb | Pb |
| hs_tl_cdich_None | hs_tl_cdich_None | Chemicals | Metals | Tl | Postnatal | NA | NA | Dichotomous variable of thallium (Tl) in child | factor | None | Tl | Tl |
| hs_dde_cadj_Log2 | hs_dde_cadj_Log2 | Chemicals | Organochlorines | DDE | Postnatal | NA | NA | Dichlorodiphenyldichloroethylene (DDE) in child adjusted for lipids | numeric | Logarithm base 2 | DDE | DDE |
| hs_ddt_cadj_Log2 | hs_ddt_cadj_Log2 | Chemicals | Organochlorines | DDT | Postnatal | NA | NA | Dichlorodiphenyltrichloroethane (DDT) in child adjusted for lipids | numeric | Logarithm base 2 | DDT | DDT |
| hs_hcb_cadj_Log2 | hs_hcb_cadj_Log2 | Chemicals | Organochlorines | HCB | Postnatal | NA | NA | Hexachlorobenzene (HCB) in child adjusted for lipids | numeric | Logarithm base 2 | HCB | HCB |
| hs_pcb118_cadj_Log2 | hs_pcb118_cadj_Log2 | Chemicals | Organochlorines | PCBs | Postnatal | NA | NA | Polychlorinated biphenyl -118 (PCB-118) in child adjusted for lipids | numeric | Logarithm base 2 | PCB 118 | PCB118 |
| hs_pcb138_cadj_Log2 | hs_pcb138_cadj_Log2 | Chemicals | Organochlorines | PCBs | Postnatal | NA | NA | Polychlorinated biphenyl-138 (PCB-138) in child adjusted for lipids | numeric | Logarithm base 2 | PCB 138 | PCB138 |
| hs_pcb153_cadj_Log2 | hs_pcb153_cadj_Log2 | Chemicals | Organochlorines | PCBs | Postnatal | NA | NA | Polychlorinated biphenyl-153 (PCB-153) in child adjusted for lipids | numeric | Logarithm base 2 | PCB 153 | PCB153 |
| hs_pcb170_cadj_Log2 | hs_pcb170_cadj_Log2 | Chemicals | Organochlorines | PCBs | Postnatal | NA | NA | Polychlorinated biphenyl-170 (PCB-170) in child adjusted for lipids | numeric | Logarithm base 2 | PCB 170 | PCB170 |
| hs_pcb180_cadj_Log2 | hs_pcb180_cadj_Log2 | Chemicals | Organochlorines | PCBs | Postnatal | NA | NA | Polychlorinated biphenyl-180 (PCB-180) in child adjusted for lipids | numeric | Logarithm base 2 | PCB 180 | PCB180 |
| hs_sumPCBs5_cadj_Log2 | hs_sumPCBs5_cadj_Log2 | Chemicals | Organochlorines | PCBs | Postnatal | NA | NA | Sum of PCBs in child adjusted for lipids (4 cohorts) | numeric | Logarithm base 2 | PCBs | SumPCB |
| hs_dep_cadj_Log2 | hs_dep_cadj_Log2 | Chemicals | Organophosphate pesticides | DEP | Postnatal | NA | NA | Diethyl phosphate (DEP) in child adjusted for creatinine | numeric | Logarithm base 2 | DEP | DEP |
| hs_detp_cadj_Log2 | hs_detp_cadj_Log2 | Chemicals | Organophosphate pesticides | DETP | Postnatal | NA | NA | Diethyl thiophosphate (DETP) in child adjusted for creatinine | numeric | Logarithm base 2 | DETP | DETP |
| hs_dmdtp_cdich_None | hs_dmdtp_cdich_None | Chemicals | Organophosphate pesticides | DMDTP | Postnatal | NA | NA | Dichotomous variable of dimethyl dithiophosphate (DMDTP) in child | factor | None | DMDTP | DMDTP |
| hs_dmp_cadj_Log2 | hs_dmp_cadj_Log2 | Chemicals | Organophosphate pesticides | DMP | Postnatal | NA | NA | Dimethyl phosphate (DMP) in child adjusted for creatinine | numeric | Logarithm base 2 | DMP | DMP |
| hs_dmtp_cadj_Log2 | hs_dmtp_cadj_Log2 | Chemicals | Organophosphate pesticides | DMTP | Postnatal | NA | NA | Dimethyl thiophosphate (DMTP) in child adjusted for creatinine | numeric | Logarithm base 2 | DMDTP | DMTP |
| hs_pbde153_cadj_Log2 | hs_pbde153_cadj_Log2 | Chemicals | Polybrominated diphenyl ethers (PBDE) | PBDE153 | Postnatal | NA | NA | Polybrominated diphenyl ether-153 (PBDE-153) in child adjusted for lipids | numeric | Logarithm base 2 | PBDE 153 | PBDE153 |
| hs_pbde47_cadj_Log2 | hs_pbde47_cadj_Log2 | Chemicals | Polybrominated diphenyl ethers (PBDE) | PBDE47 | Postnatal | NA | NA | Polybrominated diphenyl ether-47 (PBDE-47) in child adjusted for lipids | numeric | Logarithm base 2 | PBDE 47 | PBDE47 |
| hs_pfhxs_c_Log2 | hs_pfhxs_c_Log2 | Chemicals | Per- and polyfluoroalkyl substances (PFAS) | PFHXS | Postnatal | NA | NA | Perfluorohexane sulfonate (PFHXS) in child | numeric | Logarithm base 2 | PFHXS | PFHXS |
| hs_pfna_c_Log2 | hs_pfna_c_Log2 | Chemicals | Per- and polyfluoroalkyl substances (PFAS) | PFNA | Postnatal | NA | NA | Perfluorononanoate (PFNA) in child | numeric | Logarithm base 2 | PFNA | PFNA |
| hs_pfoa_c_Log2 | hs_pfoa_c_Log2 | Chemicals | Per- and polyfluoroalkyl substances (PFAS) | PFOA | Postnatal | NA | NA | Perfluorooctanoate (PFOA) in child | numeric | Logarithm base 2 | PFOA | PFOA |
| hs_pfos_c_Log2 | hs_pfos_c_Log2 | Chemicals | Per- and polyfluoroalkyl substances (PFAS) | PFOS | Postnatal | NA | NA | Perfluorooctane sulfonate (PFOS) in child | numeric | Logarithm base 2 | PFOS | PFOS |
| hs_pfunda_c_Log2 | hs_pfunda_c_Log2 | Chemicals | Per- and polyfluoroalkyl substances (PFAS) | PFUNDA | Postnatal | NA | NA | Perfluoroundecanoate (PFUNDA) in child | numeric | Logarithm base 2 | PFUNDA | PFUNDA |
| hs_bpa_cadj_Log2 | hs_bpa_cadj_Log2 | Chemicals | Phenols | BPA | Postnatal | NA | NA | Bisphenol A (BPA) in child adjusted for creatinine | numeric | Logarithm base 2 | BPA | BPA |
| hs_bupa_cadj_Log2 | hs_bupa_cadj_Log2 | Chemicals | Phenols | BUPA | Postnatal | NA | NA | N-Butyl paraben (BUPA) in child adjusted for creatinine | numeric | Logarithm base 2 | BUPA | BUPA |
| hs_etpa_cadj_Log2 | hs_etpa_cadj_Log2 | Chemicals | Phenols | ETPA | Postnatal | NA | NA | Ethyl paraben (ETPA) in child adjusted for creatinine | numeric | Logarithm base 2 | ETPA | ETPA |
| hs_mepa_cadj_Log2 | hs_mepa_cadj_Log2 | Chemicals | Phenols | MEPA | Postnatal | NA | NA | Methyl paraben (MEPA) in child adjusted for creatinine | numeric | Logarithm base 2 | MEPA | MEPA |
| hs_oxbe_cadj_Log2 | hs_oxbe_cadj_Log2 | Chemicals | Phenols | OXBE | Postnatal | NA | NA | Oxybenzone (OXBE) in child adjusted for creatinine | numeric | Logarithm base 2 | OXBE | OXBE |
| hs_prpa_cadj_Log2 | hs_prpa_cadj_Log2 | Chemicals | Phenols | PRPA | Postnatal | NA | NA | Propyl paraben (PRPA) in child adjusted for creatinine | numeric | Logarithm base 2 | PRPA | PRPA |
| hs_trcs_cadj_Log2 | hs_trcs_cadj_Log2 | Chemicals | Phenols | TRCS | Postnatal | NA | NA | Triclosan (TRCS) in child adjusted for creatinine | numeric | Logarithm base 2 | TRCS | TRCS |
| hs_mbzp_cadj_Log2 | hs_mbzp_cadj_Log2 | Chemicals | Phthalates | MBZP | Postnatal | NA | NA | Mono benzyl phthalate (MBzP) in child adjusted for creatinine | numeric | Logarithm base 2 | MBZP | MBZP |
| hs_mecpp_cadj_Log2 | hs_mecpp_cadj_Log2 | Chemicals | Phthalates | MECPP | Postnatal | NA | NA | Mono-2-ethyl 5-carboxypentyl phthalate (MECPP) in child adjusted for creatinine | numeric | Logarithm base 2 | MECPP | MECPP |
| hs_mehhp_cadj_Log2 | hs_mehhp_cadj_Log2 | Chemicals | Phthalates | MEHHP | Postnatal | NA | NA | Mono-2-ethyl-5-hydroxyhexyl phthalate (MEHHP) in child adjusted for creatinine | numeric | Logarithm base 2 | MEHHP | MEHHP |
| hs_mehp_cadj_Log2 | hs_mehp_cadj_Log2 | Chemicals | Phthalates | MEHP | Postnatal | NA | NA | Mono-2-ethylhexyl phthalate (MEHP) in child adjusted for creatinine | numeric | Logarithm base 2 | MEHP | MEHP |
| hs_meohp_cadj_Log2 | hs_meohp_cadj_Log2 | Chemicals | Phthalates | MEOHP | Postnatal | NA | NA | Mono-2-ethyl-5-oxohexyl phthalate (MEOHP) in child adjusted for creatinine | numeric | Logarithm base 2 | MEOHP | MEOHP |
| hs_mep_cadj_Log2 | hs_mep_cadj_Log2 | Chemicals | Phthalates | MEP | Postnatal | NA | NA | Monoethyl phthalate (MEP) in child adjusted for creatinine | numeric | Logarithm base 2 | MEP | MEP |
| hs_mibp_cadj_Log2 | hs_mibp_cadj_Log2 | Chemicals | Phthalates | MIBP | Postnatal | NA | NA | Mono-iso-butyl phthalate (MiBP) in child adjusted for creatinine | numeric | Logarithm base 2 | MIBP | MIBP |
| hs_mnbp_cadj_Log2 | hs_mnbp_cadj_Log2 | Chemicals | Phthalates | MNBP | Postnatal | NA | NA | Mono-n-butyl phthalate (MnBP) in child adjusted for creatinine | numeric | Logarithm base 2 | MNBP | MNBP |
| hs_ohminp_cadj_Log2 | hs_ohminp_cadj_Log2 | Chemicals | Phthalates | OHMiNP | Postnatal | NA | NA | Mono-4-methyl-7-hydroxyoctyl phthalate (OHMiNP) in child adjusted for creatinine | numeric | Logarithm base 2 | OHMiNP | OHMiNP |
| hs_oxominp_cadj_Log2 | hs_oxominp_cadj_Log2 | Chemicals | Phthalates | OXOMINP | Postnatal | NA | NA | Mono-4-methyl-7-oxooctyl phthalate (OXOMiNP) in child adjusted for creatinine | numeric | Logarithm base 2 | OXOMINP | OXOMINP |
| hs_sumDEHP_cadj_Log2 | hs_sumDEHP_cadj_Log2 | Chemicals | Phthalates | DEHP | Postnatal | NA | NA | Sum of DEHP metabolites (µg/g) in child adjusted for creatinine | numeric | Logarithm base 2 | DEHP | SumDEHP |
| FAS_cat_None | FAS_cat_None | Chemicals | Social and economic capital | Economic capital | Postnatal | NA | NA | Family affluence score | factor | None | Family affluence | FamAfl |
| hs_contactfam_3cat_num_None | hs_contactfam_3cat_num_None | Chemicals | Social and economic capital | Social capital | Postnatal | NA | NA | scoial capital: family friends | factor | None | Social contact | SocCont |
| hs_hm_pers_None | hs_hm_pers_None | Chemicals | Social and economic capital | Social capital | Postnatal | NA | NA | How many people live in your home? | numeric | None | House crowding | HouseCrow |
| hs_participation_3cat_None | hs_participation_3cat_None | Chemicals | Social and economic capital | Social capital | Postnatal | NA | NA | social capital: structural | factor | None | Social participation | SocPartic |
| hs_cotinine_cdich_None | hs_cotinine_cdich_None | Chemicals | Tobacco Smoke | Cotinine | Postnatal | NA | NA | Dichotomous variable of cotinine in child | factor | None | Cotinine | Cotinine |
| hs_globalexp2_None | hs_globalexp2_None | Chemicals | Tobacco Smoke | Tobacco Smoke | Postnatal | NA | NA | Global exposure of the child to ETS (2 categories) | factor | None | ETS | ETS |
| hs_smk_parents_None | hs_smk_parents_None | Chemicals | Tobacco Smoke | Tobacco Smoke | Postnatal | NA | NA | Tobacco Smoke status of parents (both) | factor | None | Smoking_parents | SmokPar |
| e3_sex_None | e3_sex_None | Covariates | Covariates | Child covariate | Pregnancy | NA | NA | Child sex (female / male) | factor | None | Child sex | Sex |
| e3_yearbir_None | e3_yearbir_None | Covariates | Covariates | Child covariate | Pregnancy | NA | NA | Year of birth (2003 to 2009) | factor | None | Year of birth | YearBirth |
| h_cohort | h_cohort | Covariates | Covariates | Maternal covariate | Pregnancy | NA | NA | Cohort of inclusion (1 to 6) | factor | None | Cohort | Cohort |
| h_edumc_None | h_edumc_None | Covariates | Covariates | Maternal covariate | Pregnancy | NA | NA | Maternal education (1: primary school, 2:secondary school, 3:university degree or higher) | factor | None | Maternal education | mEducation |
| hs_child_age_None | hs_child_age_None | Covariates | Covariates | Child covariate | Postnatal | NA | NA | Child age at examination (years) | numeric | None | Child age | cAge |
| hs_zbmi_who | hs_zbmi_who | Phenotype | Phenotype | Outcome at 6-11 years old | Postnatal | NA | NA | Body mass index z-score at 6-11 years old - WHO reference - Standardized on sex and age | numeric | None | Body mass index z-score | zBMI |
Lifestyle_Exposures <- combined_codebook$variable_name[combined_codebook$domain=="Lifestyles"]
lifestyle_exposome <- dplyr::select(exposome, all_of(Lifestyle_Exposures))
summarytools::view(dfSummary(lifestyle_exposome, style = 'grid', plain.ascii = FALSE, valid.col = FALSE, headings = FALSE), method = "render")
| No | Variable | Stats / Values | Freqs (% of Valid) | Graph | Missing | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | h_bfdur_Ter [factor] |
|
|
0 (0.0%) | ||||||||||||||||
| 2 | hs_bakery_prod_Ter [factor] |
|
|
0 (0.0%) | ||||||||||||||||
| 3 | hs_beverages_Ter [factor] |
|
|
0 (0.0%) | ||||||||||||||||
| 4 | hs_break_cer_Ter [factor] |
|
|
0 (0.0%) | ||||||||||||||||
| 5 | hs_caff_drink_Ter [factor] |
|
|
0 (0.0%) | ||||||||||||||||
| 6 | hs_dairy_Ter [factor] |
|
|
0 (0.0%) | ||||||||||||||||
| 7 | hs_fastfood_Ter [factor] |
|
|
0 (0.0%) | ||||||||||||||||
| 8 | hs_KIDMED_None [numeric] |
|
13 distinct values | 0 (0.0%) | ||||||||||||||||
| 9 | hs_mvpa_prd_alt_None [numeric] |
|
847 distinct values | 0 (0.0%) | ||||||||||||||||
| 10 | hs_org_food_Ter [factor] |
|
|
0 (0.0%) | ||||||||||||||||
| 11 | hs_proc_meat_Ter [factor] |
|
|
0 (0.0%) | ||||||||||||||||
| 12 | hs_readymade_Ter [factor] |
|
|
0 (0.0%) | ||||||||||||||||
| 13 | hs_sd_wk_None [numeric] |
|
368 distinct values | 0 (0.0%) | ||||||||||||||||
| 14 | hs_total_bread_Ter [factor] |
|
|
0 (0.0%) | ||||||||||||||||
| 15 | hs_total_cereal_Ter [factor] |
|
|
0 (0.0%) | ||||||||||||||||
| 16 | hs_total_fish_Ter [factor] |
|
|
0 (0.0%) | ||||||||||||||||
| 17 | hs_total_fruits_Ter [factor] |
|
|
0 (0.0%) | ||||||||||||||||
| 18 | hs_total_lipids_Ter [factor] |
|
|
0 (0.0%) | ||||||||||||||||
| 19 | hs_total_meat_Ter [factor] |
|
|
0 (0.0%) | ||||||||||||||||
| 20 | hs_total_potatoes_Ter [factor] |
|
|
0 (0.0%) | ||||||||||||||||
| 21 | hs_total_sweets_Ter [factor] |
|
|
0 (0.0%) | ||||||||||||||||
| 22 | hs_total_veg_Ter [factor] |
|
|
0 (0.0%) | ||||||||||||||||
| 23 | hs_total_yog_Ter [factor] |
|
|
0 (0.0%) | ||||||||||||||||
| 24 | hs_dif_hours_total_None [numeric] |
|
437 distinct values | 0 (0.0%) |
Generated by summarytools 1.0.1 (R version 4.4.0)
2024-07-01
#separate numeric and categorical data
numeric_lifestyle <- lifestyle_exposome %>%
dplyr::select(where(is.numeric))
numeric_lifestyle_long <- pivot_longer(
numeric_lifestyle,
cols = everything(),
names_to = "variable",
values_to = "value"
)
unique_numerical_vars <- unique(numeric_lifestyle_long$variable)
num_plots <- lapply(unique_numerical_vars, function(var) {
data <- filter(numeric_lifestyle_long, variable == var)
p <- ggplot(data, aes(x = value)) +
geom_histogram(bins = 30, fill = "blue") +
labs(title = paste("Histogram of", var), x = "Value", y = "Count")
print(p)
return(p)
})
The Sum of KIDMED indices, without index9 histogram displays a multimodal distribution with peaks primarily at scores 0, 3, 5, 7, and 10. This suggests that the dataset has several subgroups within the population, each characterized by distinct dietary habits or patterns as measured by the KIDMED index, which assesses adherence to the Mediterranean diet. The distribution is discrete, reflecting integer scores that children have received based on their dietary intake. The modes indicate the most common dietary patterns, suggesting possible clusters of dietary behavior among the children sampled.
The second histogram depicts the distribution of clean and over-reported moderate-to-vigorous physical activity (MVPA) in minutes per day. This histogram shows a right-skewed distribution, indicating that most children report lower levels of physical activity, with a smaller number of children reporting very high levels of activity, which might be over-reported. The peak near the lower end suggests that a significant portion of the sample engages in minimal to moderate amounts of MVPA, while the long tail to the right hints at a few cases with unusually high reported values, possibly due to over-reporting or measurement errors in data collection.
For sedentary behavior in minutes per day, there is a distribution that is slightly left-skewed in the histogram. Most children tend to have higher sedentary time, with a concentration of values towards the right side of the histogram. The distribution suggests that fewer children engage in lower levels of sedentary behavior, indicating a trend towards more inactivity among the sample. This pattern might raise concerns regarding lifestyle habits that contribute to prolonged periods of low physical activity.
The distribution of total hours of sleep per night (averaged over weekdays and weekends) exhibits a nearly normal distribution. This suggests that most children in the study have a consistent sleep duration with the bulk of the data clustering around the mean. The symmetry of the distribution indicates a healthy variance in sleep hours among the children, without significant extremes in either insufficient or excessive sleep, which is a positive indication of regular sleep patterns in this population.
categorical_lifestyle <- lifestyle_exposome %>%
dplyr::select(where(is.factor))
categorical_lifestyle_long <- pivot_longer(
categorical_lifestyle,
cols = everything(),
names_to = "variable",
values_to = "value"
)
unique_categorical_vars <- unique(categorical_lifestyle_long$variable)
categorical_plots <- lapply(unique_categorical_vars, function(var) {
data <- filter(categorical_lifestyle_long, variable == var)
p <- ggplot(data, aes(x = value, fill = value)) +
geom_bar(stat = "count") +
labs(title = paste("Distribution of", var), x = var, y = "Count")
print(p)
return(p)
})
Breastfeeding Duration: Majority of observations are in the highest duration category, suggesting longer breastfeeding periods are common.
Bakery Products: Shows a relatively even distribution across the three categories, indicating varied consumption levels of bakery products among participants.
Beverages: A significant number of participants consume beverages at the highest level, indicating a preference or higher consumption of beverages like sodas.
Breakfast Cereal: The highest category of cereal consumption is the most common, suggesting a preference for or greater consumption of cereals.
Caffeinated/Energy Drinks: Displays a high number of participants avoiding or consuming very low quantities of caffeinated or energy drinks.
Dairy: Shows a fairly even distribution across all categories, indicating a uniform consumption pattern of dairy products.
Fast Food: Most participants fall into the middle category, indicating moderate consumption of fast food.
Organic Food: Most participants either consume a lot of or no organic food, with fewer in the middle range.
Processed Meat: Consumption levels are fairly evenly distributed, indicating varied dietary habits regarding processed meats.
Ready-Made Meals: Many participants rarely consume ready-made meals, with a significant number also in the highest consumption category.
Bread: Distribution shows a significant leaning towards higher bread consumption.
Cereal: Even distribution across categories suggests varied cereal consumption habits.
Fish and Seafood: Even distribution across categories, indicating varied consumption of fish and seafood.
Fruits: High fruit consumption is the most common, with fewer participants in the lowest category.
Added Fats: More participants consume added fats at the lowest and highest levels, with fewer in the middle.
Meat: Consumption of meat is highest in the middle category.
Potatoes: Shows a tendency towards either low or high consumption, with fewer people in the middle range.
Sweets: High consumption of sweets is the most common, indicating a preference for or higher access to sugary foods.
Vegetables: Most participants consume a high amount of vegetables.
Yogurt: Shows a preference for either very high or very low yogurt consumption, with fewer participants in the middle.
numeric_lifestyle <- select_if(lifestyle_exposome, is.numeric)
cor_matrix <- cor(numeric_lifestyle, method = "pearson")
cor_matrix <- cor(numeric_lifestyle, method = "spearman")
corrplot(cor_matrix, method = "circle")
Chemical_Exposures <- combined_codebook$variable_name[combined_codebook$domain=="Chemicals"]
chemical_exposome <- exposome %>%
dplyr::select(all_of(Chemical_Exposures))
summarytools::view(dfSummary(chemical_exposome, style = 'grid', plain.ascii = FALSE, valid.col = FALSE, headings = FALSE), method = "render")
| No | Variable | Stats / Values | Freqs (% of Valid) | Graph | Missing | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | hs_as_c_Log2 [numeric] |
|
692 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 2 | hs_cd_c_Log2 [numeric] |
|
695 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 3 | hs_co_c_Log2 [numeric] |
|
317 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 4 | hs_cs_c_Log2 [numeric] |
|
369 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 5 | hs_cu_c_Log2 [numeric] |
|
345 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 6 | hs_hg_c_Log2 [numeric] |
|
698 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 7 | hs_mn_c_Log2 [numeric] |
|
457 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 8 | hs_mo_c_Log2 [numeric] |
|
593 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 9 | hs_pb_c_Log2 [numeric] |
|
529 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 10 | hs_tl_cdich_None [factor] |
|
|
0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 11 | hs_dde_cadj_Log2 [numeric] |
|
1050 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 12 | hs_ddt_cadj_Log2 [numeric] |
|
1039 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 13 | hs_hcb_cadj_Log2 [numeric] |
|
1036 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 14 | hs_pcb118_cadj_Log2 [numeric] |
|
1048 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 15 | hs_pcb138_cadj_Log2 [numeric] |
|
1031 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 16 | hs_pcb153_cadj_Log2 [numeric] |
|
1047 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 17 | hs_pcb170_cadj_Log2 [numeric] |
|
1039 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 18 | hs_pcb180_cadj_Log2 [numeric] |
|
1055 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 19 | hs_sumPCBs5_cadj_Log2 [numeric] |
|
1052 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 20 | hs_dep_cadj_Log2 [numeric] |
|
1045 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 21 | hs_detp_cadj_Log2 [numeric] |
|
1036 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 22 | hs_dmdtp_cdich_None [factor] |
|
|
0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 23 | hs_dmp_cadj_Log2 [numeric] |
|
1053 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 24 | hs_dmtp_cadj_Log2 [numeric] |
|
1057 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 25 | hs_pbde153_cadj_Log2 [numeric] |
|
1036 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 26 | hs_pbde47_cadj_Log2 [numeric] |
|
1010 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 27 | hs_pfhxs_c_Log2 [numeric] |
|
1061 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 28 | hs_pfna_c_Log2 [numeric] |
|
1031 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 29 | hs_pfoa_c_Log2 [numeric] |
|
1061 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 30 | hs_pfos_c_Log2 [numeric] |
|
1050 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 31 | hs_pfunda_c_Log2 [numeric] |
|
1044 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 32 | hs_bpa_cadj_Log2 [numeric] |
|
1056 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 33 | hs_bupa_cadj_Log2 [numeric] |
|
1034 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 34 | hs_etpa_cadj_Log2 [numeric] |
|
1066 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 35 | hs_mepa_cadj_Log2 [numeric] |
|
1052 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 36 | hs_oxbe_cadj_Log2 [numeric] |
|
1069 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 37 | hs_prpa_cadj_Log2 [numeric] |
|
1031 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 38 | hs_trcs_cadj_Log2 [numeric] |
|
1053 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 39 | hs_mbzp_cadj_Log2 [numeric] |
|
1046 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 40 | hs_mecpp_cadj_Log2 [numeric] |
|
1037 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 41 | hs_mehhp_cadj_Log2 [numeric] |
|
1050 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 42 | hs_mehp_cadj_Log2 [numeric] |
|
1035 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 43 | hs_meohp_cadj_Log2 [numeric] |
|
1057 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 44 | hs_mep_cadj_Log2 [numeric] |
|
1075 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 45 | hs_mibp_cadj_Log2 [numeric] |
|
1057 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 46 | hs_mnbp_cadj_Log2 [numeric] |
|
1048 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 47 | hs_ohminp_cadj_Log2 [numeric] |
|
1085 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 48 | hs_oxominp_cadj_Log2 [numeric] |
|
1059 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 49 | hs_sumDEHP_cadj_Log2 [numeric] |
|
1028 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 50 | FAS_cat_None [factor] |
|
|
0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 51 | hs_contactfam_3cat_num_None [factor] |
|
|
0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 52 | hs_hm_pers_None [numeric] |
|
|
0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 53 | hs_participation_3cat_None [factor] |
|
|
0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 54 | hs_cotinine_cdich_None [factor] |
|
|
0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 55 | hs_globalexp2_None [factor] |
|
|
0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 56 | hs_smk_parents_None [factor] |
|
|
0 (0.0%) |
Generated by summarytools 1.0.1 (R version 4.4.0)
2024-07-01
#separate numeric and categorical data
numeric_chemical <- chemical_exposome %>%
dplyr::select(where(is.numeric))
numeric_chemical_long <- pivot_longer(
numeric_chemical,
cols = everything(),
names_to = "variable",
values_to = "value"
)
unique_numerical_vars <- unique(numeric_chemical_long$variable)
num_plots <- lapply(unique_numerical_vars, function(var) {
data <- filter(numeric_chemical_long, variable == var)
p <- ggplot(data, aes(x = value)) +
geom_histogram(bins = 30, fill = "blue") +
labs(title = paste("Histogram of", var), x = "Value", y = "Count")
print(p)
return(p)
})
Arsenic (hs_as_c_Log2): This histogram shows a bimodal distribution of arsenic levels, with two prominent peaks. Such a distribution might suggest two different populations or sources of exposure among the study participants.
Cadmium (hs_cd_c_Log2): The distribution of cadmium levels is skewed to the right, indicating that most participants have lower exposure levels, with a few cases showing significantly higher exposures.
Cobalt (hs_co_c_Log2): The histogram of cobalt levels displays a roughly normal distribution centered around a slight positive skew. This suggests a common source of exposure with varying levels among the population.
Cesium (hs_cs_c_Log2): Exhibits a right-skewed distribution, indicating that most participants have relatively low exposure levels, but a small number have substantially higher exposures.
Copper (hs_cu_c_Log2): Shows a right-skewed distribution, suggesting that while most individuals have moderate exposure, a few experience significantly higher levels of copper.
Mercury (hs_hg_c_Log2): This distribution is also right-skewed, common for environmental pollutants, where a majority have lower exposure levels, and a minority have high exposure levels.
Manganese (hs_mn_c_Log2): The histogram for manganese displays a bell-shaped distribution, indicating a normal distribution of manganese levels among the participants.
Molybdenum (hs_mo_c_Log2): Shows a distribution with a sharp peak and a long right tail, suggesting that while most people have similar exposure levels, a few have exceptionally high exposures.
Lead (hs_pb_c_Log2): The distribution is slightly right-skewed, indicating higher exposure levels in a smaller group of the population compared to the majority.
DDE (hs_dde_cadj_Log2): Shows a pronounced right skew, typical for chemicals that accumulate in the environment and in human tissues, indicating higher levels of exposure in a smaller subset of the population.
DDT (hs_ddt_cadj_Log2): This histogram displays a multi-modal distribution, suggesting different sources or durations of exposure among the population.
Hexachlorobenzene (hs_hcb_cadj_Log2): Exhibits a right-skewed distribution with a long tail, indicating that most people have lower exposure levels with some outliers experiencing very high exposures.
PCB 118, 138, 153 (hs_pcb118_cadj_Log2, hs_pcb138_cadj_Log2, hs_pcb153_cadj_Log2): All three PCBs show similar distributions with right skewness, suggesting that exposure to these compounds is higher among a smaller segment of the population.
PCB 170 and PCB 180: Both histograms show a significant right skew, indicating lower concentrations of these chemicals in most samples, with fewer samples showing higher concentrations. This pattern suggests that while most individuals have low exposure, a few may have considerably higher levels.
Sum of PCBs: The histogram is approximately normally distributed, centered around a higher value compared to individual PCBs, indicating a collective higher average exposure when all measured PCBs are considered together.
DEP, DETP, DMTP, DMDTP, PBDE 153, and PBDE 47: These histograms mostly show multimodal distributions (more than one peak), suggesting different exposure sources or groups within the population that have distinct exposure levels. The multiple peaks could indicate varied exposure pathways or differences in how these chemicals are metabolized or retained in the body.
PFHxS, PFNA, and PFOA: These perfluorinated compounds display a roughly normal distribution skewed right, suggesting a common source of exposure among the population, but with some individuals experiencing higher exposures.
PFOS and PFUnDA: The histograms show a single, sharp peak with a rapid decline, indicating that most individuals have similar exposure levels, likely due to common environmental sources or regulatory controls limiting variability.
BPA: The histogram is sharply peaked near zero with a long tail to the right, indicating low exposure for most individuals but significant exposure for a few, possibly due to specific product use or occupational exposure.
MBZP (Monobenzyl Phthalate): This histogram shows a right-skewed distribution. Most values cluster at the lower end, indicating a common lower exposure level among subjects, with a long tail towards higher values suggesting occasional higher exposures.
MECPP (Mono-ethyl hexyl phthalate): The distribution is right-skewed, similar to MBZP, but with a smoother decline. This pattern also indicates that while most subjects have lower exposure levels, a few experience significantly higher exposures.
MEHHP (Mono-2-ethyl-5-hydroxyhexyl phthalate): Exhibits a unimodal distribution with a peak around a middle value and symmetric tails. This could indicate a more standardized exposure level among the subjects with some variation.
MEHP (Mono-ethylhexyl phthalate):Another right-skewed distribution, indicating that most subjects have lower exposure levels but a few have much higher levels.
MEOHP (Mono-2-ethyl-5-oxohexyl phthalate): This histogram shows a distribution with a peak around the middle values and a tail extending towards higher values, suggesting a central tendency with some higher exposures.
MEP (Mono-ethyl phthalate): The distribution is right-skewed, similar to others, showing most subjects with low to moderate levels of exposure, but a few have much higher levels.
OXINP (Oxidized Isoparaffin): This histogram shows a central peak with a fast decline, indicating a concentration of values around a specific point which might suggest a common exposure level among the subjects.
Sum of DEHP Metabolites: This shows a broad distribution with a peak towards the lower end, indicating varied exposure levels among the subjects, with most experiencing lower exposures.
Personal Care Product Use: The histogram displays a highly skewed distribution with multiple peaks, reflecting varied usage patterns among subjects, with some showing particularly high usage levels.
categorical_chemical <- chemical_exposome %>%
dplyr::select(where(is.factor))
categorical_chemical_long <- pivot_longer(
categorical_chemical,
cols = everything(),
names_to = "variable",
values_to = "value"
)
unique_categorical_vars <- unique(categorical_chemical_long$variable)
categorical_plots <- lapply(unique_categorical_vars, function(var) {
data <- filter(categorical_chemical_long, variable == var)
p <- ggplot(data, aes(x = value, fill = value)) +
geom_bar(stat = "count") +
labs(title = paste("Distribution of", var), x = var, y = "Count")
print(p)
return(p)
})
hs_t_cdich_None (Detected vs. Undetected):The vast majority of samples were undetected for this particular chemical, with only a small fraction showing detection.
hs_dmdtp_cdich_None (Detected vs. Undetected): Similar to the previous, most samples were undetected, but a higher proportion shows detection compared to the first chemical.
FAS_cat_None (Family Affluence Scale categories - Low, Middle, High): This shows the distribution of family affluence categories where the largest group is the high affluence, followed by middle, with the fewest in the low category.
hs_contactfam_3cat_num_None (Frequency of contact with family): Most individuals reported daily (almost daily) contact with family, a smaller number reported weekly contact, and the fewest reported less frequent than weekly contact.
hs_participation_3cat_None (Participation in organisations): A large number of individuals do not participate in any organisation, a substantial number participate in one, and a smaller group in two or more.
hs_cotinine_cdich_None (Detected vs. Undetected): Cotinine detection is high, indicating exposure to nicotine, with a significant number of samples showing detection versus undetected.
hs_globalexp2_None (Global Exposure - Exposure vs. No Exposure): This represents overall exposure to some condition or factor, with a larger proportion having no exposure compared to those with exposure.
hs_smk_parents_None (Smoking status of parents - Both, Neither, One): The largest group reported that neither parent smokes, a significant number reported one smoking parent, and the smallest group reported both parents smoke.
numeric_chemical <- select_if(chemical_exposome, is.numeric)
cor_matrix <- cor(numeric_chemical, method = "pearson")
cor_matrix <- cor(numeric_chemical, method = "spearman")
custom_color_scale <- list(
c(0, "darkred"),
c(0.5, "white"),
c(1, "darkblue")
)
plot_ly(
z = cor_matrix,
x = colnames(cor_matrix),
y = colnames(cor_matrix),
type = "heatmap",
colorscale = custom_color_scale
) %>%
layout(
title = "Correlation Matrix",
xaxis = list(tickangle = -90),
yaxis = list(side = "left")
)
summarytools::view(dfSummary(covariates, style = 'grid', plain.ascii = FALSE, valid.col = FALSE, headings = FALSE), method = "render")
| No | Variable | Stats / Values | Freqs (% of Valid) | Graph | Missing | |||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | ID [integer] |
|
1301 distinct values (Integer sequence) | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||
| 2 | h_cohort [factor] |
|
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||
| 3 | e3_sex_None [factor] |
|
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||
| 4 | e3_yearbir_None [factor] |
|
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||
| 5 | h_mbmi_None [numeric] |
|
853 distinct values | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||
| 6 | hs_wgtgain_None [numeric] |
|
49 distinct values | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||
| 7 | e3_gac_None [numeric] |
|
72 distinct values | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||
| 8 | h_age_None [numeric] |
|
665 distinct values | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||
| 9 | h_edumc_None [factor] |
|
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||
| 10 | h_native_None [factor] |
|
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||
| 11 | h_parity_None [factor] |
|
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||
| 12 | hs_child_age_None [numeric] |
|
879 distinct values | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||
| 13 | hs_c_height_None [numeric] |
|
311 distinct values | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||
| 14 | hs_c_weight_None [numeric] |
|
311 distinct values | 0 (0.0%) |
Generated by summarytools 1.0.1 (R version 4.4.0)
2024-07-01
#separate numeric and categorical data
numeric_covariates <- covariates %>%
dplyr::select(where(is.numeric))
numeric_covariates_long <- pivot_longer(
numeric_covariates,
cols = everything(),
names_to = "variable",
values_to = "value"
)
unique_numerical_vars <- unique(numeric_covariates_long$variable)
num_plots <- lapply(unique_numerical_vars, function(var) {
data <- filter(numeric_covariates_long, variable == var)
p <- ggplot(data, aes(x = value)) +
geom_histogram(bins = 30, fill = "blue") +
labs(title = paste("Histogram of", var), x = "Value", y = "Count")
print(p)
return(p)
})
ID: This histogram appears to show a uniform distribution of IDs over a range, with all IDs evenly spaced. This typical pattern is expected in a dataset where IDs are systematically assigned.
Maternal BMI (h_mbmi): The distribution of maternal BMI is roughly normal but slightly right-skewed, indicating that more individuals are on the higher side of the BMI scale. The peak of the histogram around the 25-30 range suggests a concentration of values in this area, which is typical for adult populations.
Weight Gain (hs_wgtgain): This histogram displays a bimodal distribution of weight gain, with significant peaks around 10 and another around 20. This could indicate two common patterns or recommendations in weight gain during pregnancy or another health-related period.
Gestational Age at Childbirth (e3_gac): The distribution is centered around the 40-week mark, which is typical for full-term pregnancies. There is a sharp peak at around 40 weeks, showing that most childbirths occur at this gestational age.
Maternal Age (h_age): This histogram shows a roughly normal distribution with a peak around the early 30s, suggesting that this is the most common age range for the mothers in the dataset.
Child’s Age (hs_child_age): This histogram is multimodal, reflecting several peaks across different ages. This could be indicative of the data collection points or particular age groups being studied.
Child’s Height (hs_c_height): The data is approximately normally distributed with a slight right skew. The majority of the measurements cluster around the mean, which suggests typical growth patterns.
Child’s Weight (hs_c_weight): This histogram is right-skewed, indicating that while most children’s weights are within a normal range, there is a long tail of children who weigh more, which might suggest variations in growth or cases of overweight.
categorical_covariates <- covariates %>%
dplyr::select(where(is.factor))
categorical_covariates_long <- pivot_longer(
categorical_covariates,
cols = everything(),
names_to = "variable",
values_to = "value"
)
unique_categorical_vars <- unique(categorical_covariates_long$variable)
categorical_plots <- lapply(unique_categorical_vars, function(var) {
data <- filter(categorical_covariates_long, variable == var)
p <- ggplot(data, aes(x = value, fill = value)) +
geom_bar(stat = "count") +
labs(title = paste("Distribution of", var), x = var, y = "Count")
print(p)
return(p)
})
Cohorts (h_cohort): The distribution shows the count of subjects across six different cohorts. All cohorts have a substantial number of subjects, with cohort 5 showing the highest participation.
Gender Distribution (e3_sex): The gender distribution is nearly balanced with a slight higher count for males compared to females.
Year of Birth (e3_yearbir): This chart shows that the majority of subjects were born in the later years, with a significant increase in 2009, indicating perhaps a larger recruitment or a specific cohort focus that year.
Educational Level (h_educmc): Represents three categories of educational attainment, with category 3 having the highest count, suggesting a higher level of education among the majority of the subjects.
Native Language (h_native): Shows the count of parents by their native country status. The majority are from category 2.
Parity (h_parity): The chart categorizes subjects based on the number of children they have. The largest group is those with no children, followed by those with one child, and a smaller group with two children.
numeric_covariate <- select_if(covariates, is.numeric)
cor_matrix <- cor(numeric_covariate, method = "pearson")
cor_matrix <- cor(numeric_covariate, method = "spearman")
corrplot(cor_matrix, method = "circle")
outcome_BMI <- phenotype %>%
dplyr::select(hs_zbmi_who)
summarytools::view(dfSummary(outcome_BMI, style = 'grid', plain.ascii = FALSE, valid.col = FALSE, headings = FALSE), method = "render")
| No | Variable | Stats / Values | Freqs (% of Valid) | Graph | Missing | ||||
|---|---|---|---|---|---|---|---|---|---|
| 1 | hs_zbmi_who [numeric] |
|
421 distinct values | 0 (0.0%) |
Generated by summarytools 1.0.1 (R version 4.4.0)
2024-07-01
outcome_cov <- cbind(covariates, outcome_BMI)
outcome_cov <- outcome_cov[, !duplicated(colnames(outcome_cov))]
outcome_cov <- outcome_cov %>%
dplyr::select(hs_child_age_None, h_cohort, e3_sex_None, e3_yearbir_None, h_edumc_None, h_native_None, hs_zbmi_who)
summary_table <- dfSummary(outcome_cov,
varnumbers = TRUE,
valid.col = FALSE,
graph.col = TRUE,
style = "multiline")
print(summary_table, method = "render", plain.ascii = FALSE, style = "grid")
| No | Variable | Stats / Values | Freqs (% of Valid) | Graph | Missing | |||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | hs_child_age_None [numeric] |
|
879 distinct values | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||
| 2 | h_cohort [factor] |
|
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||
| 3 | e3_sex_None [factor] |
|
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||
| 4 | e3_yearbir_None [factor] |
|
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||
| 5 | h_edumc_None [factor] |
|
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||
| 6 | h_native_None [factor] |
|
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||
| 7 | hs_zbmi_who [numeric] |
|
421 distinct values | 0 (0.0%) |
Generated by summarytools 1.0.1 (R version 4.4.0)
2024-07-01
#the full chemicals list
chemicals_full <- c(
"hs_as_c_Log2",
"hs_cd_c_Log2",
"hs_co_c_Log2",
"hs_cs_c_Log2",
"hs_cu_c_Log2",
"hs_hg_c_Log2",
"hs_mn_c_Log2",
"hs_mo_c_Log2",
"hs_pb_c_Log2",
"hs_tl_cdich_None",
"hs_dde_cadj_Log2",
"hs_ddt_cadj_Log2",
"hs_hcb_cadj_Log2",
"hs_pcb118_cadj_Log2",
"hs_pcb138_cadj_Log2",
"hs_pcb153_cadj_Log2",
"hs_pcb170_cadj_Log2",
"hs_pcb180_cadj_Log2",
"hs_dep_cadj_Log2",
"hs_detp_cadj_Log2",
"hs_dmdtp_cdich_None",
"hs_dmp_cadj_Log2",
"hs_dmtp_cadj_Log2",
"hs_pbde153_cadj_Log2",
"hs_pbde47_cadj_Log2",
"hs_pfhxs_c_Log2",
"hs_pfna_c_Log2",
"hs_pfoa_c_Log2",
"hs_pfos_c_Log2",
"hs_pfunda_c_Log2",
"hs_bpa_cadj_Log2",
"hs_bupa_cadj_Log2",
"hs_etpa_cadj_Log2",
"hs_mepa_cadj_Log2",
"hs_oxbe_cadj_Log2",
"hs_prpa_cadj_Log2",
"hs_trcs_cadj_Log2",
"hs_mbzp_cadj_Log2",
"hs_mecpp_cadj_Log2",
"hs_mehhp_cadj_Log2",
"hs_mehp_cadj_Log2",
"hs_meohp_cadj_Log2",
"hs_mep_cadj_Log2",
"hs_mibp_cadj_Log2",
"hs_mnbp_cadj_Log2",
"hs_ohminp_cadj_Log2",
"hs_oxominp_cadj_Log2",
"FAS_cat_None",
"hs_contactfam_3cat_num_None",
"hs_hm_pers_None",
"hs_participation_3cat_None",
"hs_cotinine_cdich_None",
"hs_globalexp2_None",
"hs_smk_parents_None"
)
#postnatal diet for child
postnatal_diet <- c(
"h_bfdur_Ter",
"hs_bakery_prod_Ter",
"hs_beverages_Ter",
"hs_break_cer_Ter",
"hs_caff_drink_Ter",
"hs_dairy_Ter",
"hs_fastfood_Ter",
"h_legume_preg_Ter",
"hs_org_food_Ter",
"hs_proc_meat_Ter",
"hs_readymade_Ter",
"hs_total_bread_Ter",
"hs_total_cereal_Ter",
"hs_total_fish_Ter",
"hs_total_fruits_Ter",
"hs_total_lipids_Ter",
"hs_total_meat_Ter",
"hs_total_potatoes_Ter",
"hs_total_sweets_Ter",
"hs_total_veg_Ter",
"hs_total_yog_Ter"
)
all_columns <- c(chemicals_full, postnatal_diet)
extracted_exposome <- exposome %>% dplyr::select(all_of(all_columns))
head(extracted_exposome)
selected_data <- cbind(outcome_cov, extracted_exposome)
head(selected_data)
selected_data_corr <- select_if(selected_data, is.numeric)
cor_matrix <- cor(selected_data_corr, method = "pearson")
cor_matrix <- cor(selected_data_corr, method = "spearman")
custom_color_scale <- list(
c(0, "darkred"),
c(0.5, "white"),
c(1, "darkblue")
)
plot_ly(
z = cor_matrix,
x = colnames(cor_matrix),
y = colnames(cor_matrix),
type = "heatmap",
colorscale = custom_color_scale
) %>%
layout(
title = "Correlation Matrix",
xaxis = list(tickangle = -90),
yaxis = list(side = "left")
)
#LASSO train/test 70-30
set.seed(101)
train_indices <- sample(seq_len(nrow(selected_data)), size = floor(0.7 * nrow(selected_data)))
test_indices <- setdiff(seq_len(nrow(selected_data)), train_indices)
x_train <- as.matrix(selected_data[train_indices, setdiff(names(selected_data), "hs_zbmi_who")])
y_train <- selected_data$hs_zbmi_who[train_indices]
x_test <- as.matrix(selected_data[test_indices, setdiff(names(selected_data), "hs_zbmi_who")])
y_test <- selected_data$hs_zbmi_who[test_indices]
fit_with_covariates_train <- cv.glmnet(x_train, y_train, alpha = 1, family = "gaussian")
fit_with_covariates_test <- predict(fit_with_covariates_train, s = "lambda.min", newx = x_test)
test_mse_with_covariates <- mean((y_test - fit_with_covariates_test)^2)
x_train_chemicals_only <- as.matrix(selected_data[train_indices, chemicals_full])
x_test_chemicals_only <- as.matrix(selected_data[test_indices, chemicals_full])
fit_without_covariates_train <- cv.glmnet(x_train_chemicals_only, y_train, alpha = 1, family = "gaussian")
fit_without_covariates_test <- predict(fit_without_covariates_train, s = "lambda.min", newx = x_test_chemicals_only)
test_mse_without_covariates <- mean((y_test - fit_without_covariates_test)^2)
plot(fit_with_covariates_train, xvar = "lambda", main = "Coefficients Path (With Covariates)")
plot(fit_without_covariates_train, xvar = "lambda", main = "Coefficients Path (Without Covariates)")
best_lambda <- fit_with_covariates_train$lambda.min # lambda that minimizes the MSE
coef(fit_with_covariates_train, s = best_lambda) # coefficients at the chosen lambda
## 82 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -5.5678784940
## hs_child_age_None .
## h_cohort 0.0695567858
## e3_sex_None .
## e3_yearbir_None .
## h_edumc_None .
## h_native_None 0.0511006743
## hs_as_c_Log2 .
## hs_cd_c_Log2 -0.0336774443
## hs_co_c_Log2 -0.0239636201
## hs_cs_c_Log2 0.0820543441
## hs_cu_c_Log2 0.6610426794
## hs_hg_c_Log2 -0.0017088529
## hs_mn_c_Log2 .
## hs_mo_c_Log2 -0.1156505312
## hs_pb_c_Log2 .
## hs_tl_cdich_None .
## hs_dde_cadj_Log2 -0.0657354802
## hs_ddt_cadj_Log2 .
## hs_hcb_cadj_Log2 .
## hs_pcb118_cadj_Log2 .
## hs_pcb138_cadj_Log2 .
## hs_pcb153_cadj_Log2 -0.1856578642
## hs_pcb170_cadj_Log2 -0.0573285421
## hs_pcb180_cadj_Log2 .
## hs_dep_cadj_Log2 -0.0193032245
## hs_detp_cadj_Log2 .
## hs_dmdtp_cdich_None .
## hs_dmp_cadj_Log2 .
## hs_dmtp_cadj_Log2 .
## hs_pbde153_cadj_Log2 -0.0328956540
## hs_pbde47_cadj_Log2 .
## hs_pfhxs_c_Log2 .
## hs_pfna_c_Log2 .
## hs_pfoa_c_Log2 -0.0993929110
## hs_pfos_c_Log2 -0.0755069975
## hs_pfunda_c_Log2 .
## hs_bpa_cadj_Log2 .
## hs_bupa_cadj_Log2 .
## hs_etpa_cadj_Log2 .
## hs_mepa_cadj_Log2 .
## hs_oxbe_cadj_Log2 0.0006650708
## hs_prpa_cadj_Log2 0.0057866608
## hs_trcs_cadj_Log2 0.0003819532
## hs_mbzp_cadj_Log2 0.0347966360
## hs_mecpp_cadj_Log2 .
## hs_mehhp_cadj_Log2 .
## hs_mehp_cadj_Log2 .
## hs_meohp_cadj_Log2 .
## hs_mep_cadj_Log2 .
## hs_mibp_cadj_Log2 -0.0244119191
## hs_mnbp_cadj_Log2 -0.0243769631
## hs_ohminp_cadj_Log2 .
## hs_oxominp_cadj_Log2 .
## FAS_cat_None .
## hs_contactfam_3cat_num_None .
## hs_hm_pers_None -0.0028088257
## hs_participation_3cat_None .
## hs_cotinine_cdich_None .
## hs_globalexp2_None .
## hs_smk_parents_None .
## h_bfdur_Ter .
## hs_bakery_prod_Ter .
## hs_beverages_Ter .
## hs_break_cer_Ter .
## hs_caff_drink_Ter .
## hs_dairy_Ter .
## hs_fastfood_Ter .
## h_legume_preg_Ter .
## hs_org_food_Ter .
## hs_proc_meat_Ter .
## hs_readymade_Ter .
## hs_total_bread_Ter .
## hs_total_cereal_Ter .
## hs_total_fish_Ter .
## hs_total_fruits_Ter .
## hs_total_lipids_Ter .
## hs_total_meat_Ter .
## hs_total_potatoes_Ter .
## hs_total_sweets_Ter .
## hs_total_veg_Ter .
## hs_total_yog_Ter .
best_lambda <- fit_without_covariates_train$lambda.min # lambda that minimizes the MSE
coef(fit_without_covariates_train, s = best_lambda)
## 55 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -5.1837493309
## hs_as_c_Log2 .
## hs_cd_c_Log2 -0.0284173280
## hs_co_c_Log2 -0.0142399006
## hs_cs_c_Log2 0.1021877004
## hs_cu_c_Log2 0.6609749636
## hs_hg_c_Log2 -0.0171114022
## hs_mn_c_Log2 .
## hs_mo_c_Log2 -0.1088122852
## hs_pb_c_Log2 -0.0217220451
## hs_tl_cdich_None .
## hs_dde_cadj_Log2 -0.0420566432
## hs_ddt_cadj_Log2 .
## hs_hcb_cadj_Log2 .
## hs_pcb118_cadj_Log2 .
## hs_pcb138_cadj_Log2 .
## hs_pcb153_cadj_Log2 -0.1717111807
## hs_pcb170_cadj_Log2 -0.0597612863
## hs_pcb180_cadj_Log2 .
## hs_dep_cadj_Log2 -0.0212245496
## hs_detp_cadj_Log2 0.0001951293
## hs_dmdtp_cdich_None .
## hs_dmp_cadj_Log2 .
## hs_dmtp_cadj_Log2 .
## hs_pbde153_cadj_Log2 -0.0361530310
## hs_pbde47_cadj_Log2 .
## hs_pfhxs_c_Log2 -0.0102171741
## hs_pfna_c_Log2 .
## hs_pfoa_c_Log2 -0.1415449389
## hs_pfos_c_Log2 -0.0486025276
## hs_pfunda_c_Log2 .
## hs_bpa_cadj_Log2 .
## hs_bupa_cadj_Log2 .
## hs_etpa_cadj_Log2 .
## hs_mepa_cadj_Log2 -0.0027645744
## hs_oxbe_cadj_Log2 0.0060056008
## hs_prpa_cadj_Log2 0.0039341981
## hs_trcs_cadj_Log2 .
## hs_mbzp_cadj_Log2 0.0499456100
## hs_mecpp_cadj_Log2 .
## hs_mehhp_cadj_Log2 .
## hs_mehp_cadj_Log2 .
## hs_meohp_cadj_Log2 .
## hs_mep_cadj_Log2 .
## hs_mibp_cadj_Log2 -0.0559548066
## hs_mnbp_cadj_Log2 -0.0134892925
## hs_ohminp_cadj_Log2 .
## hs_oxominp_cadj_Log2 .
## FAS_cat_None .
## hs_contactfam_3cat_num_None .
## hs_hm_pers_None -0.0161154323
## hs_participation_3cat_None .
## hs_cotinine_cdich_None .
## hs_globalexp2_None .
## hs_smk_parents_None .
cat("Model with Covariates - Test MSE:", test_mse_with_covariates, "\n")
## Model with Covariates - Test MSE: 1.185848
cat("Model without Covariates - Test MSE:", test_mse_without_covariates, "\n")
## Model without Covariates - Test MSE: 1.22833
# RIDGE
fit_with_covariates_train <- cv.glmnet(x_train, y_train, alpha = 0, family = "gaussian")
fit_with_covariates_test <- predict(fit_with_covariates_train, s = "lambda.min", newx = x_test)
test_mse_with_covariates <- mean((y_test - fit_with_covariates_test)^2)
x_train_chemicals_only <- as.matrix(selected_data[train_indices, chemicals_full])
x_test_chemicals_only <- as.matrix(selected_data[test_indices, chemicals_full])
fit_without_covariates_train <- cv.glmnet(x_train_chemicals_only, y_train, alpha = 0, family = "gaussian")
fit_without_covariates_test <- predict(fit_without_covariates_train, s = "lambda.min", newx = x_test_chemicals_only)
test_mse_without_covariates <- mean((y_test - fit_without_covariates_test)^2)
plot(fit_with_covariates_train, xvar = "lambda", main = "Coefficients Path (With Covariates)")
plot(fit_without_covariates_train, xvar = "lambda", main = "Coefficients Path (Without Covariates)")
best_lambda <- fit_with_covariates_train$lambda.min # lambda that minimizes the MSE
coef(fit_with_covariates_train, s = best_lambda) # coefficients at the chosen lambda
## 82 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -42.160955638
## hs_child_age_None -0.024653654
## h_cohort 0.056124306
## e3_sex_None .
## e3_yearbir_None 0.018777525
## h_edumc_None 0.016255702
## h_native_None 0.071773042
## hs_as_c_Log2 0.006364945
## hs_cd_c_Log2 -0.042733563
## hs_co_c_Log2 -0.062828135
## hs_cs_c_Log2 0.133399431
## hs_cu_c_Log2 0.609965500
## hs_hg_c_Log2 -0.023639135
## hs_mn_c_Log2 -0.024596407
## hs_mo_c_Log2 -0.115214282
## hs_pb_c_Log2 -0.026511118
## hs_tl_cdich_None .
## hs_dde_cadj_Log2 -0.065049721
## hs_ddt_cadj_Log2 0.001881727
## hs_hcb_cadj_Log2 -0.032088608
## hs_pcb118_cadj_Log2 0.026419517
## hs_pcb138_cadj_Log2 -0.039147083
## hs_pcb153_cadj_Log2 -0.129106673
## hs_pcb170_cadj_Log2 -0.051481423
## hs_pcb180_cadj_Log2 -0.011929531
## hs_dep_cadj_Log2 -0.024659540
## hs_detp_cadj_Log2 0.006320785
## hs_dmdtp_cdich_None .
## hs_dmp_cadj_Log2 -0.002412305
## hs_dmtp_cadj_Log2 0.001052736
## hs_pbde153_cadj_Log2 -0.031205814
## hs_pbde47_cadj_Log2 0.009911956
## hs_pfhxs_c_Log2 -0.005219457
## hs_pfna_c_Log2 0.005003013
## hs_pfoa_c_Log2 -0.135134818
## hs_pfos_c_Log2 -0.072775595
## hs_pfunda_c_Log2 0.010645653
## hs_bpa_cadj_Log2 -0.004906553
## hs_bupa_cadj_Log2 0.004399280
## hs_etpa_cadj_Log2 -0.006127763
## hs_mepa_cadj_Log2 -0.015191889
## hs_oxbe_cadj_Log2 0.009931284
## hs_prpa_cadj_Log2 0.013906415
## hs_trcs_cadj_Log2 0.011135502
## hs_mbzp_cadj_Log2 0.054325547
## hs_mecpp_cadj_Log2 -0.009161963
## hs_mehhp_cadj_Log2 0.012553175
## hs_mehp_cadj_Log2 -0.014601711
## hs_meohp_cadj_Log2 0.003810181
## hs_mep_cadj_Log2 0.014463820
## hs_mibp_cadj_Log2 -0.042581695
## hs_mnbp_cadj_Log2 -0.052773051
## hs_ohminp_cadj_Log2 -0.024175703
## hs_oxominp_cadj_Log2 0.020047912
## FAS_cat_None .
## hs_contactfam_3cat_num_None .
## hs_hm_pers_None -0.023634190
## hs_participation_3cat_None .
## hs_cotinine_cdich_None .
## hs_globalexp2_None .
## hs_smk_parents_None .
## h_bfdur_Ter .
## hs_bakery_prod_Ter .
## hs_beverages_Ter .
## hs_break_cer_Ter .
## hs_caff_drink_Ter .
## hs_dairy_Ter .
## hs_fastfood_Ter .
## h_legume_preg_Ter .
## hs_org_food_Ter .
## hs_proc_meat_Ter .
## hs_readymade_Ter .
## hs_total_bread_Ter .
## hs_total_cereal_Ter .
## hs_total_fish_Ter .
## hs_total_fruits_Ter .
## hs_total_lipids_Ter .
## hs_total_meat_Ter .
## hs_total_potatoes_Ter .
## hs_total_sweets_Ter .
## hs_total_veg_Ter .
## hs_total_yog_Ter .
best_lambda <- fit_without_covariates_train$lambda.min # lambda that minimizes the MSE
coef(fit_without_covariates_train, s = best_lambda)
## 55 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -4.239624e+00
## hs_as_c_Log2 6.052451e-03
## hs_cd_c_Log2 -3.886611e-02
## hs_co_c_Log2 -4.854412e-02
## hs_cs_c_Log2 1.151068e-01
## hs_cu_c_Log2 5.959543e-01
## hs_hg_c_Log2 -3.122837e-02
## hs_mn_c_Log2 -3.082238e-02
## hs_mo_c_Log2 -1.046374e-01
## hs_pb_c_Log2 -4.878458e-02
## hs_tl_cdich_None .
## hs_dde_cadj_Log2 -4.719029e-02
## hs_ddt_cadj_Log2 3.820665e-03
## hs_hcb_cadj_Log2 -1.971507e-02
## hs_pcb118_cadj_Log2 1.201408e-02
## hs_pcb138_cadj_Log2 -3.868824e-02
## hs_pcb153_cadj_Log2 -1.193205e-01
## hs_pcb170_cadj_Log2 -5.095402e-02
## hs_pcb180_cadj_Log2 -1.203014e-02
## hs_dep_cadj_Log2 -2.456945e-02
## hs_detp_cadj_Log2 7.811685e-03
## hs_dmdtp_cdich_None .
## hs_dmp_cadj_Log2 -2.075000e-03
## hs_dmtp_cadj_Log2 2.511909e-04
## hs_pbde153_cadj_Log2 -3.225764e-02
## hs_pbde47_cadj_Log2 5.263678e-03
## hs_pfhxs_c_Log2 -3.100931e-02
## hs_pfna_c_Log2 2.070402e-02
## hs_pfoa_c_Log2 -1.487317e-01
## hs_pfos_c_Log2 -6.238062e-02
## hs_pfunda_c_Log2 1.141304e-02
## hs_bpa_cadj_Log2 -9.692717e-05
## hs_bupa_cadj_Log2 6.208731e-03
## hs_etpa_cadj_Log2 -6.434128e-03
## hs_mepa_cadj_Log2 -1.573173e-02
## hs_oxbe_cadj_Log2 1.308651e-02
## hs_prpa_cadj_Log2 1.226900e-02
## hs_trcs_cadj_Log2 2.731754e-03
## hs_mbzp_cadj_Log2 5.356129e-02
## hs_mecpp_cadj_Log2 2.546898e-03
## hs_mehhp_cadj_Log2 1.984704e-02
## hs_mehp_cadj_Log2 -1.470314e-02
## hs_meohp_cadj_Log2 1.126869e-02
## hs_mep_cadj_Log2 3.543863e-03
## hs_mibp_cadj_Log2 -5.228014e-02
## hs_mnbp_cadj_Log2 -4.190152e-02
## hs_ohminp_cadj_Log2 -2.737398e-02
## hs_oxominp_cadj_Log2 2.144488e-02
## FAS_cat_None .
## hs_contactfam_3cat_num_None .
## hs_hm_pers_None -3.221832e-02
## hs_participation_3cat_None .
## hs_cotinine_cdich_None .
## hs_globalexp2_None .
## hs_smk_parents_None .
cat("Model with Covariates - Test MSE:", test_mse_with_covariates, "\n")
## Model with Covariates - Test MSE: 1.145497
cat("Model without Covariates - Test MSE:", test_mse_without_covariates, "\n")
## Model without Covariates - Test MSE: 1.186358
# ELASTIC NET
fit_with_covariates_train <- cv.glmnet(x_train, y_train, alpha = 0.5, family = "gaussian")
fit_with_covariates_test <- predict(fit_with_covariates_train, s = "lambda.min", newx = x_test)
test_mse_with_covariates <- mean((y_test - fit_with_covariates_test)^2)
x_train_chemicals_only <- as.matrix(selected_data[train_indices, chemicals_full])
x_test_chemicals_only <- as.matrix(selected_data[test_indices, chemicals_full])
fit_without_covariates_train <- cv.glmnet(x_train_chemicals_only, y_train, alpha = 0.5, family = "gaussian")
fit_without_covariates_test <- predict(fit_without_covariates_train, s = "lambda.min", newx = x_test_chemicals_only)
test_mse_without_covariates <- mean((y_test - fit_without_covariates_test)^2)
plot(fit_with_covariates_train, xvar = "lambda", main = "Coefficients Path (With Covariates)")
plot(fit_without_covariates_train, xvar = "lambda", main = "Coefficients Path (Without Covariates)")
best_lambda <- fit_with_covariates_train$lambda.min # lambda that minimizes the MSE
coef(fit_with_covariates_train, s = best_lambda) # coefficients at the chosen lambda
## 82 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -5.3993079033
## hs_child_age_None .
## h_cohort 0.0655211032
## e3_sex_None .
## e3_yearbir_None .
## h_edumc_None .
## h_native_None 0.0520046931
## hs_as_c_Log2 .
## hs_cd_c_Log2 -0.0329314231
## hs_co_c_Log2 -0.0224384954
## hs_cs_c_Log2 0.0793739121
## hs_cu_c_Log2 0.6441104526
## hs_hg_c_Log2 -0.0018698300
## hs_mn_c_Log2 .
## hs_mo_c_Log2 -0.1118336023
## hs_pb_c_Log2 .
## hs_tl_cdich_None .
## hs_dde_cadj_Log2 -0.0632356026
## hs_ddt_cadj_Log2 .
## hs_hcb_cadj_Log2 .
## hs_pcb118_cadj_Log2 .
## hs_pcb138_cadj_Log2 .
## hs_pcb153_cadj_Log2 -0.1833493206
## hs_pcb170_cadj_Log2 -0.0562600632
## hs_pcb180_cadj_Log2 .
## hs_dep_cadj_Log2 -0.0188690919
## hs_detp_cadj_Log2 .
## hs_dmdtp_cdich_None .
## hs_dmp_cadj_Log2 .
## hs_dmtp_cadj_Log2 .
## hs_pbde153_cadj_Log2 -0.0326246963
## hs_pbde47_cadj_Log2 .
## hs_pfhxs_c_Log2 .
## hs_pfna_c_Log2 .
## hs_pfoa_c_Log2 -0.1022981441
## hs_pfos_c_Log2 -0.0730756063
## hs_pfunda_c_Log2 .
## hs_bpa_cadj_Log2 .
## hs_bupa_cadj_Log2 .
## hs_etpa_cadj_Log2 .
## hs_mepa_cadj_Log2 .
## hs_oxbe_cadj_Log2 0.0007162562
## hs_prpa_cadj_Log2 0.0056296363
## hs_trcs_cadj_Log2 .
## hs_mbzp_cadj_Log2 0.0335195179
## hs_mecpp_cadj_Log2 .
## hs_mehhp_cadj_Log2 .
## hs_mehp_cadj_Log2 .
## hs_meohp_cadj_Log2 .
## hs_mep_cadj_Log2 .
## hs_mibp_cadj_Log2 -0.0241809131
## hs_mnbp_cadj_Log2 -0.0236651281
## hs_ohminp_cadj_Log2 .
## hs_oxominp_cadj_Log2 .
## FAS_cat_None .
## hs_contactfam_3cat_num_None .
## hs_hm_pers_None -0.0031390265
## hs_participation_3cat_None .
## hs_cotinine_cdich_None .
## hs_globalexp2_None .
## hs_smk_parents_None .
## h_bfdur_Ter .
## hs_bakery_prod_Ter .
## hs_beverages_Ter .
## hs_break_cer_Ter .
## hs_caff_drink_Ter .
## hs_dairy_Ter .
## hs_fastfood_Ter .
## h_legume_preg_Ter .
## hs_org_food_Ter .
## hs_proc_meat_Ter .
## hs_readymade_Ter .
## hs_total_bread_Ter .
## hs_total_cereal_Ter .
## hs_total_fish_Ter .
## hs_total_fruits_Ter .
## hs_total_lipids_Ter .
## hs_total_meat_Ter .
## hs_total_potatoes_Ter .
## hs_total_sweets_Ter .
## hs_total_veg_Ter .
## hs_total_yog_Ter .
best_lambda <- fit_without_covariates_train$lambda.min # lambda that minimizes the MSE
coef(fit_without_covariates_train, s = best_lambda)
## 55 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -5.165056742
## hs_as_c_Log2 .
## hs_cd_c_Log2 -0.029587674
## hs_co_c_Log2 -0.018234610
## hs_cs_c_Log2 0.105479459
## hs_cu_c_Log2 0.660783281
## hs_hg_c_Log2 -0.018407443
## hs_mn_c_Log2 .
## hs_mo_c_Log2 -0.108921552
## hs_pb_c_Log2 -0.024687445
## hs_tl_cdich_None .
## hs_dde_cadj_Log2 -0.043043336
## hs_ddt_cadj_Log2 .
## hs_hcb_cadj_Log2 .
## hs_pcb118_cadj_Log2 .
## hs_pcb138_cadj_Log2 .
## hs_pcb153_cadj_Log2 -0.169983885
## hs_pcb170_cadj_Log2 -0.059558355
## hs_pcb180_cadj_Log2 .
## hs_dep_cadj_Log2 -0.021814037
## hs_detp_cadj_Log2 0.001177980
## hs_dmdtp_cdich_None .
## hs_dmp_cadj_Log2 .
## hs_dmtp_cadj_Log2 .
## hs_pbde153_cadj_Log2 -0.035870956
## hs_pbde47_cadj_Log2 .
## hs_pfhxs_c_Log2 -0.013483975
## hs_pfna_c_Log2 .
## hs_pfoa_c_Log2 -0.141668953
## hs_pfos_c_Log2 -0.048925424
## hs_pfunda_c_Log2 .
## hs_bpa_cadj_Log2 .
## hs_bupa_cadj_Log2 .
## hs_etpa_cadj_Log2 .
## hs_mepa_cadj_Log2 -0.005175616
## hs_oxbe_cadj_Log2 0.007141167
## hs_prpa_cadj_Log2 0.005237280
## hs_trcs_cadj_Log2 .
## hs_mbzp_cadj_Log2 0.051380905
## hs_mecpp_cadj_Log2 .
## hs_mehhp_cadj_Log2 .
## hs_mehp_cadj_Log2 .
## hs_meohp_cadj_Log2 .
## hs_mep_cadj_Log2 .
## hs_mibp_cadj_Log2 -0.055499484
## hs_mnbp_cadj_Log2 -0.016778105
## hs_ohminp_cadj_Log2 .
## hs_oxominp_cadj_Log2 .
## FAS_cat_None .
## hs_contactfam_3cat_num_None .
## hs_hm_pers_None -0.018050249
## hs_participation_3cat_None .
## hs_cotinine_cdich_None .
## hs_globalexp2_None .
## hs_smk_parents_None .
cat("Model with Covariates - Test MSE:", test_mse_with_covariates, "\n")
## Model with Covariates - Test MSE: 1.185552
cat("Model without Covariates - Test MSE:", test_mse_without_covariates, "\n")
## Model without Covariates - Test MSE: 1.225393
Ridge has the better test MSE among the three but, to consider the features the elastic net might be a middle ground, balancing feature selection with model complexity, especially useful if there are many correlated predictors.
#selected chemicals that were noted in enet
chemicals_selected <- c(
"hs_cd_c_Log2",
"hs_co_c_Log2",
"hs_cs_c_Log2",
"hs_cu_c_Log2",
"hs_hg_c_Log2",
"hs_mo_c_Log2",
"hs_pb_c_Log2",
"hs_dde_cadj_Log2",
"hs_pcb153_cadj_Log2",
"hs_pcb170_cadj_Log2",
"hs_dep_cadj_Log2",
"hs_detp_cadj_Log2",
"hs_pbde153_cadj_Log2",
"hs_pfhxs_c_Log2",
"hs_pfoa_c_Log2",
"hs_pfos_c_Log2",
"hs_mepa_cadj_Log2",
"hs_oxbe_cadj_Log2",
"hs_prpa_cadj_Log2",
"hs_mbzp_cadj_Log2",
"hs_mibp_cadj_Log2",
"hs_mnbp_cadj_Log2",
"hs_hm_pers_None")
# LASSO with train/test
set.seed(101)
train_indices <- sample(seq_len(nrow(selected_data)), size = floor(0.7 * nrow(selected_data)))
test_indices <- setdiff(seq_len(nrow(selected_data)), train_indices)
diet_data <- selected_data[, postnatal_diet]
x_diet_train <- model.matrix(~ . + 0, data = diet_data[train_indices, ])
x_diet_test <- model.matrix(~ . + 0, data = diet_data[test_indices, ])
covariates <- selected_data[, c("e3_sex_None", "e3_yearbir_None", "h_edumc_None", "h_cohort", "hs_child_age_None")]
x_covariates_train <- model.matrix(~ . + 0, data = covariates[train_indices, ])
x_covariates_test <- model.matrix(~ . + 0, data = covariates[test_indices, ])
x_full_train <- cbind(x_diet_train, x_covariates_train)
x_full_test <- cbind(x_diet_test, x_covariates_test)
x_full_train[is.na(x_full_train)] <- 0
x_full_test[is.na(x_full_test)] <- 0
x_diet_train[is.na(x_diet_train)] <- 0
x_diet_test[is.na(x_diet_test)] <- 0
y_train <- as.numeric(selected_data$hs_zbmi_who[train_indices])
y_test <- as.numeric(selected_data$hs_zbmi_who[test_indices])
# fit models
fit_with_covariates <- cv.glmnet(x_full_train, y_train, alpha = 1, family = "gaussian")
fit_with_covariates
##
## Call: cv.glmnet(x = x_full_train, y = y_train, alpha = 1, family = "gaussian")
##
## Measure: Mean-Squared Error
##
## Lambda Index Measure SE Nonzero
## min 0.04164 17 1.404 0.06582 17
## 1se 0.18447 1 1.440 0.06226 0
fit_without_covariates <- cv.glmnet(x_diet_train, y_train, alpha = 1, family = "gaussian")
fit_without_covariates
##
## Call: cv.glmnet(x = x_diet_train, y = y_train, alpha = 1, family = "gaussian")
##
## Measure: Mean-Squared Error
##
## Lambda Index Measure SE Nonzero
## min 0.03609 16 1.423 0.08753 14
## 1se 0.14570 1 1.440 0.08232 0
plot(fit_with_covariates, xvar = "lambda", main = "Coefficient Path (With Covariates)")
plot(fit_without_covariates, xvar = "lambda", main = "Coefficient Path (Without Covariates)")
best_lambda <- fit_with_covariates$lambda.min # lambda that minimizes the MSE
coef(fit_with_covariates, s = best_lambda) # coefficients at the chosen lambda
## 59 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) 0.5648339722
## h_bfdur_Ter(0,10.8] .
## h_bfdur_Ter(10.8,34.9] .
## h_bfdur_Ter(34.9,Inf] .
## hs_bakery_prod_Ter(2,6] .
## hs_bakery_prod_Ter(6,Inf] -0.1133584362
## hs_beverages_Ter(0.132,1] .
## hs_beverages_Ter(1,Inf] .
## hs_break_cer_Ter(1.1,5.5] .
## hs_break_cer_Ter(5.5,Inf] -0.0316932102
## hs_caff_drink_Ter(0.132,Inf] .
## hs_dairy_Ter(14.6,25.6] 0.0088653749
## hs_dairy_Ter(25.6,Inf] .
## hs_fastfood_Ter(0.132,0.5] .
## hs_fastfood_Ter(0.5,Inf] .
## h_legume_preg_Ter(0.5,2] .
## h_legume_preg_Ter(2,Inf] .
## hs_org_food_Ter(0.132,1] .
## hs_org_food_Ter(1,Inf] -0.1213297545
## hs_proc_meat_Ter(1.5,4] .
## hs_proc_meat_Ter(4,Inf] .
## hs_readymade_Ter(0.132,0.5] .
## hs_readymade_Ter(0.5,Inf] .
## hs_total_bread_Ter(7,17.5] .
## hs_total_bread_Ter(17.5,Inf] .
## hs_total_cereal_Ter(14.1,23.6] .
## hs_total_cereal_Ter(23.6,Inf] .
## hs_total_fish_Ter(1.5,3] .
## hs_total_fish_Ter(3,Inf] .
## hs_total_fruits_Ter(7,14.1] 0.0002333088
## hs_total_fruits_Ter(14.1,Inf] -0.0126943442
## hs_total_lipids_Ter(3,7] .
## hs_total_lipids_Ter(7,Inf] -0.0063934032
## hs_total_meat_Ter(6,9] .
## hs_total_meat_Ter(9,Inf] .
## hs_total_potatoes_Ter(3,4] .
## hs_total_potatoes_Ter(4,Inf] .
## hs_total_sweets_Ter(4.1,8.5] -0.0749964702
## hs_total_sweets_Ter(8.5,Inf] .
## hs_total_veg_Ter(6,8.5] .
## hs_total_veg_Ter(8.5,Inf] -0.0598834839
## hs_total_yog_Ter(6,8.5] .
## hs_total_yog_Ter(8.5,Inf] .
## e3_sex_Nonefemale -0.0456861174
## e3_sex_Nonemale .
## e3_yearbir_None2004 -0.0438776088
## e3_yearbir_None2005 .
## e3_yearbir_None2006 .
## e3_yearbir_None2007 .
## e3_yearbir_None2008 .
## e3_yearbir_None2009 .
## h_edumc_None2 0.0018813738
## h_edumc_None3 -0.0724241304
## h_cohort2 .
## h_cohort3 0.3406608342
## h_cohort4 0.1018684574
## h_cohort5 -0.1592809829
## h_cohort6 0.1985783490
## hs_child_age_None .
best_lambda <- fit_without_covariates$lambda.min # lambda that minimizes the MSE
coef(fit_without_covariates, s = best_lambda)
## 43 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) 0.636340202
## h_bfdur_Ter(0,10.8] .
## h_bfdur_Ter(10.8,34.9] 0.023906468
## h_bfdur_Ter(34.9,Inf] .
## hs_bakery_prod_Ter(2,6] .
## hs_bakery_prod_Ter(6,Inf] -0.056385462
## hs_beverages_Ter(0.132,1] .
## hs_beverages_Ter(1,Inf] .
## hs_break_cer_Ter(1.1,5.5] .
## hs_break_cer_Ter(5.5,Inf] -0.055371885
## hs_caff_drink_Ter(0.132,Inf] .
## hs_dairy_Ter(14.6,25.6] 0.046097840
## hs_dairy_Ter(25.6,Inf] .
## hs_fastfood_Ter(0.132,0.5] .
## hs_fastfood_Ter(0.5,Inf] .
## h_legume_preg_Ter(0.5,2] 0.122896858
## h_legume_preg_Ter(2,Inf] .
## hs_org_food_Ter(0.132,1] .
## hs_org_food_Ter(1,Inf] -0.184946634
## hs_proc_meat_Ter(1.5,4] 0.005012152
## hs_proc_meat_Ter(4,Inf] -0.007381914
## hs_readymade_Ter(0.132,0.5] .
## hs_readymade_Ter(0.5,Inf] .
## hs_total_bread_Ter(7,17.5] .
## hs_total_bread_Ter(17.5,Inf] .
## hs_total_cereal_Ter(14.1,23.6] .
## hs_total_cereal_Ter(23.6,Inf] .
## hs_total_fish_Ter(1.5,3] -0.056227911
## hs_total_fish_Ter(3,Inf] .
## hs_total_fruits_Ter(7,14.1] 0.009755541
## hs_total_fruits_Ter(14.1,Inf] -0.053778743
## hs_total_lipids_Ter(3,7] .
## hs_total_lipids_Ter(7,Inf] -0.081293095
## hs_total_meat_Ter(6,9] .
## hs_total_meat_Ter(9,Inf] .
## hs_total_potatoes_Ter(3,4] .
## hs_total_potatoes_Ter(4,Inf] .
## hs_total_sweets_Ter(4.1,8.5] -0.098908669
## hs_total_sweets_Ter(8.5,Inf] .
## hs_total_veg_Ter(6,8.5] .
## hs_total_veg_Ter(8.5,Inf] -0.118700721
## hs_total_yog_Ter(6,8.5] .
## hs_total_yog_Ter(8.5,Inf] .
predictions_with_covariates <- predict(fit_with_covariates, s = "lambda.min", newx = x_full_test)
mse_with_covariates <- mean((y_test - predictions_with_covariates)^2)
predictions_without_covariates <- predict(fit_without_covariates, s = "lambda.min", newx = x_diet_test)
mse_without_covariates <- mean((y_test - predictions_without_covariates)^2)
cat("Model with Covariates - Test MSE:", mse_with_covariates, "\n")
## Model with Covariates - Test MSE: 1.290294
cat("Model without Covariates - Test MSE:", mse_without_covariates, "\n")
## Model without Covariates - Test MSE: 1.339523
# RIDGE
fit_with_covariates <- cv.glmnet(x_full_train, y_train, alpha = 0, family = "gaussian")
fit_with_covariates
##
## Call: cv.glmnet(x = x_full_train, y = y_train, alpha = 0, family = "gaussian")
##
## Measure: Mean-Squared Error
##
## Lambda Index Measure SE Nonzero
## min 2.8 46 1.411 0.05847 58
## 1se 184.5 1 1.443 0.05868 58
fit_without_covariates <- cv.glmnet(x_diet_train, y_train, alpha = 0, family = "gaussian")
fit_without_covariates
##
## Call: cv.glmnet(x = x_diet_train, y = y_train, alpha = 0, family = "gaussian")
##
## Measure: Mean-Squared Error
##
## Lambda Index Measure SE Nonzero
## min 2.67 44 1.428 0.09579 42
## 1se 145.70 1 1.442 0.09676 42
plot(fit_with_covariates, xvar = "lambda", main = "Coefficient Path (With Covariates)")
plot(fit_without_covariates, xvar = "lambda", main = "Coefficient Path (Without Covariates)")
best_lambda <- fit_with_covariates$lambda.min # lambda that minimizes the MSE
coef(fit_with_covariates, s = best_lambda) # coefficients at the chosen lambda
## 59 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) 5.284758e-01
## h_bfdur_Ter(0,10.8] -1.099740e-02
## h_bfdur_Ter(10.8,34.9] 2.348192e-02
## h_bfdur_Ter(34.9,Inf] -5.818857e-03
## hs_bakery_prod_Ter(2,6] 1.787112e-02
## hs_bakery_prod_Ter(6,Inf] -4.243477e-02
## hs_beverages_Ter(0.132,1] -5.906794e-03
## hs_beverages_Ter(1,Inf] 6.641315e-04
## hs_break_cer_Ter(1.1,5.5] -6.605575e-03
## hs_break_cer_Ter(5.5,Inf] -3.912608e-02
## hs_caff_drink_Ter(0.132,Inf] -8.278957e-03
## hs_dairy_Ter(14.6,25.6] 3.591020e-02
## hs_dairy_Ter(25.6,Inf] 6.424051e-03
## hs_fastfood_Ter(0.132,0.5] 2.178827e-02
## hs_fastfood_Ter(0.5,Inf] -4.269624e-03
## h_legume_preg_Ter(0.5,2] 4.974182e-02
## h_legume_preg_Ter(2,Inf] 1.585245e-03
## hs_org_food_Ter(0.132,1] 2.509655e-02
## hs_org_food_Ter(1,Inf] -6.642514e-02
## hs_proc_meat_Ter(1.5,4] 2.290516e-02
## hs_proc_meat_Ter(4,Inf] -1.869895e-02
## hs_readymade_Ter(0.132,0.5] -3.529727e-03
## hs_readymade_Ter(0.5,Inf] 2.083586e-02
## hs_total_bread_Ter(7,17.5] -1.349903e-02
## hs_total_bread_Ter(17.5,Inf] 3.056733e-03
## hs_total_cereal_Ter(14.1,23.6] 7.759686e-03
## hs_total_cereal_Ter(23.6,Inf] -7.029716e-03
## hs_total_fish_Ter(1.5,3] -3.342082e-02
## hs_total_fish_Ter(3,Inf] -6.005614e-03
## hs_total_fruits_Ter(7,14.1] 2.803696e-02
## hs_total_fruits_Ter(14.1,Inf] -3.681803e-02
## hs_total_lipids_Ter(3,7] 4.843535e-03
## hs_total_lipids_Ter(7,Inf] -4.016998e-02
## hs_total_meat_Ter(6,9] 5.159825e-04
## hs_total_meat_Ter(9,Inf] 3.464491e-05
## hs_total_potatoes_Ter(3,4] 1.513769e-02
## hs_total_potatoes_Ter(4,Inf] -3.104207e-03
## hs_total_sweets_Ter(4.1,8.5] -4.691703e-02
## hs_total_sweets_Ter(8.5,Inf] 3.756444e-03
## hs_total_veg_Ter(6,8.5] -1.722606e-03
## hs_total_veg_Ter(8.5,Inf] -5.121521e-02
## hs_total_yog_Ter(6,8.5] -7.640896e-03
## hs_total_yog_Ter(8.5,Inf] -9.306272e-03
## e3_sex_Nonefemale -3.002022e-02
## e3_sex_Nonemale 3.001873e-02
## e3_yearbir_None2004 -6.037200e-02
## e3_yearbir_None2005 2.739531e-02
## e3_yearbir_None2006 -2.330959e-02
## e3_yearbir_None2007 -5.212367e-03
## e3_yearbir_None2008 2.717594e-02
## e3_yearbir_None2009 -3.054503e-02
## h_edumc_None2 4.119509e-02
## h_edumc_None3 -5.287052e-02
## h_cohort2 -3.451696e-02
## h_cohort3 1.160360e-01
## h_cohort4 3.965935e-02
## h_cohort5 -9.174135e-02
## h_cohort6 5.828852e-02
## hs_child_age_None -4.847617e-03
best_lambda <- fit_without_covariates$lambda.min # lambda that minimizes the MSE
coef(fit_without_covariates, s = best_lambda)
## 43 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) 5.124985e-01
## h_bfdur_Ter(0,10.8] -1.358118e-02
## h_bfdur_Ter(10.8,34.9] 3.774674e-02
## h_bfdur_Ter(34.9,Inf] -1.340757e-02
## hs_bakery_prod_Ter(2,6] 2.587561e-02
## hs_bakery_prod_Ter(6,Inf] -3.405874e-02
## hs_beverages_Ter(0.132,1] -9.027534e-03
## hs_beverages_Ter(1,Inf] 5.374768e-05
## hs_break_cer_Ter(1.1,5.5] -6.059998e-03
## hs_break_cer_Ter(5.5,Inf] -4.124579e-02
## hs_caff_drink_Ter(0.132,Inf] -1.677759e-02
## hs_dairy_Ter(14.6,25.6] 4.144777e-02
## hs_dairy_Ter(25.6,Inf] 1.323447e-03
## hs_fastfood_Ter(0.132,0.5] 2.022363e-02
## hs_fastfood_Ter(0.5,Inf] 5.573297e-04
## h_legume_preg_Ter(0.5,2] 7.276374e-02
## h_legume_preg_Ter(2,Inf] 1.171750e-02
## hs_org_food_Ter(0.132,1] 1.744858e-02
## hs_org_food_Ter(1,Inf] -8.024588e-02
## hs_proc_meat_Ter(1.5,4] 2.556512e-02
## hs_proc_meat_Ter(4,Inf] -2.290499e-02
## hs_readymade_Ter(0.132,0.5] -1.518843e-03
## hs_readymade_Ter(0.5,Inf] 1.336285e-02
## hs_total_bread_Ter(7,17.5] -5.426895e-03
## hs_total_bread_Ter(17.5,Inf] -7.750245e-03
## hs_total_cereal_Ter(14.1,23.6] 1.018806e-02
## hs_total_cereal_Ter(23.6,Inf] -1.449850e-02
## hs_total_fish_Ter(1.5,3] -4.277028e-02
## hs_total_fish_Ter(3,Inf] -7.793985e-03
## hs_total_fruits_Ter(7,14.1] 3.021019e-02
## hs_total_fruits_Ter(14.1,Inf] -4.358446e-02
## hs_total_lipids_Ter(3,7] -2.401764e-03
## hs_total_lipids_Ter(7,Inf] -5.319258e-02
## hs_total_meat_Ter(6,9] 2.804418e-04
## hs_total_meat_Ter(9,Inf] 2.156405e-03
## hs_total_potatoes_Ter(3,4] 1.359808e-02
## hs_total_potatoes_Ter(4,Inf] 6.311069e-03
## hs_total_sweets_Ter(4.1,8.5] -4.902698e-02
## hs_total_sweets_Ter(8.5,Inf] 3.982619e-04
## hs_total_veg_Ter(6,8.5] -1.589070e-03
## hs_total_veg_Ter(8.5,Inf] -6.559892e-02
## hs_total_yog_Ter(6,8.5] -1.131931e-02
## hs_total_yog_Ter(8.5,Inf] -1.062038e-02
predictions_with_covariates <- predict(fit_with_covariates, s = "lambda.min", newx = x_full_test)
mse_with_covariates <- mean((y_test - predictions_with_covariates)^2)
predictions_without_covariates <- predict(fit_without_covariates, s = "lambda.min", newx = x_diet_test)
mse_without_covariates <- mean((y_test - predictions_without_covariates)^2)
cat("Model with Covariates - Test MSE:", mse_with_covariates, "\n")
## Model with Covariates - Test MSE: 1.293595
cat("Model without Covariates - Test MSE:", mse_without_covariates, "\n")
## Model without Covariates - Test MSE: 1.32074
#ELASTIC NET
fit_with_covariates <- cv.glmnet(x_full_train, y_train, alpha = 0.5, family = "gaussian")
fit_with_covariates
##
## Call: cv.glmnet(x = x_full_train, y = y_train, alpha = 0.5, family = "gaussian")
##
## Measure: Mean-Squared Error
##
## Lambda Index Measure SE Nonzero
## min 0.0759 18 1.395 0.08667 21
## 1se 0.3689 1 1.443 0.08799 0
fit_without_covariates <- cv.glmnet(x_diet_train, y_train, alpha = 0.5, family = "gaussian")
fit_without_covariates
##
## Call: cv.glmnet(x = x_diet_train, y = y_train, alpha = 0.5, family = "gaussian")
##
## Measure: Mean-Squared Error
##
## Lambda Index Measure SE Nonzero
## min 0.07218 16 1.423 0.04773 14
## 1se 0.29139 1 1.443 0.04721 0
plot(fit_with_covariates, xvar = "lambda", main = "Coefficient Path (With Covariates)")
plot(fit_without_covariates, xvar = "lambda", main = "Coefficient Path (Without Covariates)")
best_lambda <- fit_with_covariates$lambda.min # lambda that minimizes the MSE
coef(fit_with_covariates, s = best_lambda) # coefficients at the chosen lambda
## 59 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) 0.556988350
## h_bfdur_Ter(0,10.8] .
## h_bfdur_Ter(10.8,34.9] .
## h_bfdur_Ter(34.9,Inf] .
## hs_bakery_prod_Ter(2,6] .
## hs_bakery_prod_Ter(6,Inf] -0.117162076
## hs_beverages_Ter(0.132,1] .
## hs_beverages_Ter(1,Inf] .
## hs_break_cer_Ter(1.1,5.5] .
## hs_break_cer_Ter(5.5,Inf] -0.038515496
## hs_caff_drink_Ter(0.132,Inf] .
## hs_dairy_Ter(14.6,25.6] 0.016190661
## hs_dairy_Ter(25.6,Inf] .
## hs_fastfood_Ter(0.132,0.5] 0.002015339
## hs_fastfood_Ter(0.5,Inf] .
## h_legume_preg_Ter(0.5,2] .
## h_legume_preg_Ter(2,Inf] .
## hs_org_food_Ter(0.132,1] .
## hs_org_food_Ter(1,Inf] -0.124678871
## hs_proc_meat_Ter(1.5,4] .
## hs_proc_meat_Ter(4,Inf] .
## hs_readymade_Ter(0.132,0.5] .
## hs_readymade_Ter(0.5,Inf] .
## hs_total_bread_Ter(7,17.5] .
## hs_total_bread_Ter(17.5,Inf] .
## hs_total_cereal_Ter(14.1,23.6] .
## hs_total_cereal_Ter(23.6,Inf] .
## hs_total_fish_Ter(1.5,3] -0.006501057
## hs_total_fish_Ter(3,Inf] .
## hs_total_fruits_Ter(7,14.1] 0.008385672
## hs_total_fruits_Ter(14.1,Inf] -0.013686766
## hs_total_lipids_Ter(3,7] .
## hs_total_lipids_Ter(7,Inf] -0.013220992
## hs_total_meat_Ter(6,9] .
## hs_total_meat_Ter(9,Inf] .
## hs_total_potatoes_Ter(3,4] .
## hs_total_potatoes_Ter(4,Inf] .
## hs_total_sweets_Ter(4.1,8.5] -0.080056006
## hs_total_sweets_Ter(8.5,Inf] .
## hs_total_veg_Ter(6,8.5] .
## hs_total_veg_Ter(8.5,Inf] -0.062455960
## hs_total_yog_Ter(6,8.5] .
## hs_total_yog_Ter(8.5,Inf] .
## e3_sex_Nonefemale -0.033536211
## e3_sex_Nonemale 0.019075233
## e3_yearbir_None2004 -0.055993440
## e3_yearbir_None2005 .
## e3_yearbir_None2006 .
## e3_yearbir_None2007 .
## e3_yearbir_None2008 0.001843416
## e3_yearbir_None2009 .
## h_edumc_None2 0.010841027
## h_edumc_None3 -0.070926328
## h_cohort2 .
## h_cohort3 0.330368900
## h_cohort4 0.102865663
## h_cohort5 -0.164328625
## h_cohort6 0.191650613
## hs_child_age_None .
best_lambda <- fit_without_covariates$lambda.min # lambda that minimizes the MSE
coef(fit_without_covariates, s = best_lambda)
## 43 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) 0.629842910
## h_bfdur_Ter(0,10.8] .
## h_bfdur_Ter(10.8,34.9] 0.024327566
## h_bfdur_Ter(34.9,Inf] .
## hs_bakery_prod_Ter(2,6] .
## hs_bakery_prod_Ter(6,Inf] -0.054460574
## hs_beverages_Ter(0.132,1] .
## hs_beverages_Ter(1,Inf] .
## hs_break_cer_Ter(1.1,5.5] .
## hs_break_cer_Ter(5.5,Inf] -0.053751932
## hs_caff_drink_Ter(0.132,Inf] .
## hs_dairy_Ter(14.6,25.6] 0.045347081
## hs_dairy_Ter(25.6,Inf] .
## hs_fastfood_Ter(0.132,0.5] .
## hs_fastfood_Ter(0.5,Inf] .
## h_legume_preg_Ter(0.5,2] 0.119953594
## h_legume_preg_Ter(2,Inf] .
## hs_org_food_Ter(0.132,1] .
## hs_org_food_Ter(1,Inf] -0.180021881
## hs_proc_meat_Ter(1.5,4] 0.005675465
## hs_proc_meat_Ter(4,Inf] -0.006853354
## hs_readymade_Ter(0.132,0.5] .
## hs_readymade_Ter(0.5,Inf] .
## hs_total_bread_Ter(7,17.5] .
## hs_total_bread_Ter(17.5,Inf] .
## hs_total_cereal_Ter(14.1,23.6] .
## hs_total_cereal_Ter(23.6,Inf] .
## hs_total_fish_Ter(1.5,3] -0.054562906
## hs_total_fish_Ter(3,Inf] .
## hs_total_fruits_Ter(7,14.1] 0.009891884
## hs_total_fruits_Ter(14.1,Inf] -0.053496352
## hs_total_lipids_Ter(3,7] .
## hs_total_lipids_Ter(7,Inf] -0.080133298
## hs_total_meat_Ter(6,9] .
## hs_total_meat_Ter(9,Inf] .
## hs_total_potatoes_Ter(3,4] .
## hs_total_potatoes_Ter(4,Inf] .
## hs_total_sweets_Ter(4.1,8.5] -0.095378443
## hs_total_sweets_Ter(8.5,Inf] .
## hs_total_veg_Ter(6,8.5] .
## hs_total_veg_Ter(8.5,Inf] -0.116385243
## hs_total_yog_Ter(6,8.5] .
## hs_total_yog_Ter(8.5,Inf] .
predictions_with_covariates <- predict(fit_with_covariates, s = "lambda.min", newx = x_full_test)
mse_with_covariates <- mean((y_test - predictions_with_covariates)^2)
predictions_without_covariates <- predict(fit_without_covariates, s = "lambda.min", newx = x_diet_test)
mse_without_covariates <- mean((y_test - predictions_without_covariates)^2)
cat("Model with Covariates - Test MSE:", mse_with_covariates, "\n")
## Model with Covariates - Test MSE: 1.288522
cat("Model without Covariates - Test MSE:", mse_without_covariates, "\n")
## Model without Covariates - Test MSE: 1.339045
set.seed(101)
train_indices <- sample(seq_len(nrow(selected_data)), size = floor(0.7 * nrow(selected_data)))
test_indices <- setdiff(seq_len(nrow(selected_data)), train_indices)
diet_data <- selected_data[, postnatal_diet]
x_diet_train <- model.matrix(~ . + 0, data = diet_data[train_indices, ])
x_diet_test <- model.matrix(~ . + 0, data = diet_data[test_indices, ])
chemical_data <- selected_data[, chemicals_full]
x_chemical_train <- as.matrix(chemical_data[train_indices, ])
x_chemical_test <- as.matrix(chemical_data[test_indices, ])
covariates <- selected_data[, c("e3_sex_None", "e3_yearbir_None", "h_edumc_None", "h_cohort", "hs_child_age_None")]
x_covariates_train <- model.matrix(~ . + 0, data = covariates[train_indices, ])
x_covariates_test <- model.matrix(~ . + 0, data = covariates[test_indices, ])
# combine diet and chemical data with and without covariates
x_combined_train <- cbind(x_diet_train, x_chemical_train)
x_combined_test <- cbind(x_diet_test, x_chemical_test)
x_full_train <- cbind(x_combined_train, x_covariates_train)
x_full_test <- cbind(x_combined_test, x_covariates_test)
# make sure no missing values
x_full_train[is.na(x_full_train)] <- 0
x_full_test[is.na(x_full_test)] <- 0
x_combined_train[is.na(x_combined_train)] <- 0
x_combined_test[is.na(x_combined_test)] <- 0
y_train <- as.numeric(selected_data$hs_zbmi_who[train_indices])
y_test <- as.numeric(selected_data$hs_zbmi_who[test_indices])
# LASSO
fit_with_covariates <- cv.glmnet(x_full_train, y_train, alpha = 1, family = "gaussian")
predictions_with_covariates <- predict(fit_with_covariates, s = "lambda.min", newx = x_full_test)
mse_with_covariates <- mean((y_test - predictions_with_covariates)^2)
fit_without_covariates <- cv.glmnet(x_combined_train, y_train, alpha = 1, family = "gaussian")
predictions_without_covariates <- predict(fit_without_covariates, s = "lambda.min", newx = x_combined_test)
mse_without_covariates <- mean((y_test - predictions_without_covariates)^2)
plot(fit_with_covariates, xvar = "lambda", main = "Coefficient Path (With Covariates)")
plot(fit_without_covariates, xvar = "lambda", main = "Coefficient Path (Without Covariates)")
best_lambda <- fit_with_covariates$lambda.min # lambda that minimizes the MSE
coef(fit_with_covariates, s = best_lambda) # coefficients at the chosen lambda
## 113 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -4.760129e+00
## h_bfdur_Ter(0,10.8] -8.142886e-02
## h_bfdur_Ter(10.8,34.9] .
## h_bfdur_Ter(34.9,Inf] 3.535147e-02
## hs_bakery_prod_Ter(2,6] .
## hs_bakery_prod_Ter(6,Inf] -1.905172e-01
## hs_beverages_Ter(0.132,1] .
## hs_beverages_Ter(1,Inf] .
## hs_break_cer_Ter(1.1,5.5] .
## hs_break_cer_Ter(5.5,Inf] .
## hs_caff_drink_Ter(0.132,Inf] .
## hs_dairy_Ter(14.6,25.6] .
## hs_dairy_Ter(25.6,Inf] .
## hs_fastfood_Ter(0.132,0.5] 4.249786e-02
## hs_fastfood_Ter(0.5,Inf] .
## h_legume_preg_Ter(0.5,2] .
## h_legume_preg_Ter(2,Inf] -4.964957e-02
## hs_org_food_Ter(0.132,1] .
## hs_org_food_Ter(1,Inf] .
## hs_proc_meat_Ter(1.5,4] .
## hs_proc_meat_Ter(4,Inf] .
## hs_readymade_Ter(0.132,0.5] .
## hs_readymade_Ter(0.5,Inf] .
## hs_total_bread_Ter(7,17.5] -2.257386e-02
## hs_total_bread_Ter(17.5,Inf] .
## hs_total_cereal_Ter(14.1,23.6] .
## hs_total_cereal_Ter(23.6,Inf] .
## hs_total_fish_Ter(1.5,3] .
## hs_total_fish_Ter(3,Inf] .
## hs_total_fruits_Ter(7,14.1] .
## hs_total_fruits_Ter(14.1,Inf] -4.053583e-03
## hs_total_lipids_Ter(3,7] .
## hs_total_lipids_Ter(7,Inf] -7.313663e-03
## hs_total_meat_Ter(6,9] .
## hs_total_meat_Ter(9,Inf] .
## hs_total_potatoes_Ter(3,4] .
## hs_total_potatoes_Ter(4,Inf] .
## hs_total_sweets_Ter(4.1,8.5] -4.057094e-03
## hs_total_sweets_Ter(8.5,Inf] .
## hs_total_veg_Ter(6,8.5] .
## hs_total_veg_Ter(8.5,Inf] .
## hs_total_yog_Ter(6,8.5] .
## hs_total_yog_Ter(8.5,Inf] .
## hs_as_c_Log2 .
## hs_cd_c_Log2 -7.112010e-03
## hs_co_c_Log2 .
## hs_cs_c_Log2 1.008616e-01
## hs_cu_c_Log2 6.321231e-01
## hs_hg_c_Log2 -1.434019e-02
## hs_mn_c_Log2 .
## hs_mo_c_Log2 -8.139669e-02
## hs_pb_c_Log2 -2.612069e-03
## hs_tl_cdich_None .
## hs_dde_cadj_Log2 -2.914806e-02
## hs_ddt_cadj_Log2 .
## hs_hcb_cadj_Log2 .
## hs_pcb118_cadj_Log2 .
## hs_pcb138_cadj_Log2 .
## hs_pcb153_cadj_Log2 -2.722245e-01
## hs_pcb170_cadj_Log2 -5.353440e-02
## hs_pcb180_cadj_Log2 .
## hs_dep_cadj_Log2 -1.500516e-02
## hs_detp_cadj_Log2 .
## hs_dmdtp_cdich_None .
## hs_dmp_cadj_Log2 .
## hs_dmtp_cadj_Log2 .
## hs_pbde153_cadj_Log2 -3.347976e-02
## hs_pbde47_cadj_Log2 .
## hs_pfhxs_c_Log2 .
## hs_pfna_c_Log2 .
## hs_pfoa_c_Log2 -1.269209e-01
## hs_pfos_c_Log2 .
## hs_pfunda_c_Log2 .
## hs_bpa_cadj_Log2 .
## hs_bupa_cadj_Log2 .
## hs_etpa_cadj_Log2 .
## hs_mepa_cadj_Log2 -1.611062e-03
## hs_oxbe_cadj_Log2 .
## hs_prpa_cadj_Log2 .
## hs_trcs_cadj_Log2 .
## hs_mbzp_cadj_Log2 3.333918e-02
## hs_mecpp_cadj_Log2 .
## hs_mehhp_cadj_Log2 .
## hs_mehp_cadj_Log2 .
## hs_meohp_cadj_Log2 .
## hs_mep_cadj_Log2 .
## hs_mibp_cadj_Log2 -1.952485e-02
## hs_mnbp_cadj_Log2 .
## hs_ohminp_cadj_Log2 .
## hs_oxominp_cadj_Log2 .
## FAS_cat_None .
## hs_contactfam_3cat_num_None .
## hs_hm_pers_None .
## hs_participation_3cat_None .
## hs_cotinine_cdich_None .
## hs_globalexp2_None .
## hs_smk_parents_None .
## e3_sex_Nonefemale -1.145910e-01
## e3_sex_Nonemale 4.837453e-16
## e3_yearbir_None2004 -8.130929e-02
## e3_yearbir_None2005 .
## e3_yearbir_None2006 .
## e3_yearbir_None2007 .
## e3_yearbir_None2008 .
## e3_yearbir_None2009 .
## h_edumc_None2 .
## h_edumc_None3 .
## h_cohort2 -4.177208e-02
## h_cohort3 3.549906e-01
## h_cohort4 2.054442e-01
## h_cohort5 .
## h_cohort6 .
## hs_child_age_None .
best_lambda <- fit_without_covariates$lambda.min # lambda that minimizes the MSE
coef(fit_without_covariates, s = best_lambda)
## 97 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -5.2935057038
## h_bfdur_Ter(0,10.8] -0.1339940396
## h_bfdur_Ter(10.8,34.9] .
## h_bfdur_Ter(34.9,Inf] 0.0089494335
## hs_bakery_prod_Ter(2,6] .
## hs_bakery_prod_Ter(6,Inf] -0.2141485632
## hs_beverages_Ter(0.132,1] .
## hs_beverages_Ter(1,Inf] .
## hs_break_cer_Ter(1.1,5.5] .
## hs_break_cer_Ter(5.5,Inf] .
## hs_caff_drink_Ter(0.132,Inf] .
## hs_dairy_Ter(14.6,25.6] 0.0067011171
## hs_dairy_Ter(25.6,Inf] .
## hs_fastfood_Ter(0.132,0.5] 0.0753896525
## hs_fastfood_Ter(0.5,Inf] .
## h_legume_preg_Ter(0.5,2] .
## h_legume_preg_Ter(2,Inf] -0.0976436998
## hs_org_food_Ter(0.132,1] .
## hs_org_food_Ter(1,Inf] .
## hs_proc_meat_Ter(1.5,4] .
## hs_proc_meat_Ter(4,Inf] .
## hs_readymade_Ter(0.132,0.5] .
## hs_readymade_Ter(0.5,Inf] 0.0093784806
## hs_total_bread_Ter(7,17.5] -0.0133317671
## hs_total_bread_Ter(17.5,Inf] .
## hs_total_cereal_Ter(14.1,23.6] .
## hs_total_cereal_Ter(23.6,Inf] .
## hs_total_fish_Ter(1.5,3] -0.0293188976
## hs_total_fish_Ter(3,Inf] .
## hs_total_fruits_Ter(7,14.1] .
## hs_total_fruits_Ter(14.1,Inf] -0.0224728007
## hs_total_lipids_Ter(3,7] .
## hs_total_lipids_Ter(7,Inf] -0.0481926073
## hs_total_meat_Ter(6,9] .
## hs_total_meat_Ter(9,Inf] .
## hs_total_potatoes_Ter(3,4] 0.0164385738
## hs_total_potatoes_Ter(4,Inf] .
## hs_total_sweets_Ter(4.1,8.5] -0.0198498846
## hs_total_sweets_Ter(8.5,Inf] .
## hs_total_veg_Ter(6,8.5] .
## hs_total_veg_Ter(8.5,Inf] -0.0427817570
## hs_total_yog_Ter(6,8.5] .
## hs_total_yog_Ter(8.5,Inf] .
## hs_as_c_Log2 .
## hs_cd_c_Log2 -0.0265679349
## hs_co_c_Log2 -0.0097399554
## hs_cs_c_Log2 0.0689070580
## hs_cu_c_Log2 0.6918303003
## hs_hg_c_Log2 -0.0164166806
## hs_mn_c_Log2 .
## hs_mo_c_Log2 -0.1028019657
## hs_pb_c_Log2 .
## hs_tl_cdich_None .
## hs_dde_cadj_Log2 -0.0281622667
## hs_ddt_cadj_Log2 .
## hs_hcb_cadj_Log2 .
## hs_pcb118_cadj_Log2 .
## hs_pcb138_cadj_Log2 .
## hs_pcb153_cadj_Log2 -0.2416501185
## hs_pcb170_cadj_Log2 -0.0554100591
## hs_pcb180_cadj_Log2 .
## hs_dep_cadj_Log2 -0.0195864300
## hs_detp_cadj_Log2 .
## hs_dmdtp_cdich_None .
## hs_dmp_cadj_Log2 .
## hs_dmtp_cadj_Log2 .
## hs_pbde153_cadj_Log2 -0.0356346258
## hs_pbde47_cadj_Log2 .
## hs_pfhxs_c_Log2 .
## hs_pfna_c_Log2 .
## hs_pfoa_c_Log2 -0.1213922819
## hs_pfos_c_Log2 -0.0523978803
## hs_pfunda_c_Log2 .
## hs_bpa_cadj_Log2 .
## hs_bupa_cadj_Log2 .
## hs_etpa_cadj_Log2 .
## hs_mepa_cadj_Log2 .
## hs_oxbe_cadj_Log2 .
## hs_prpa_cadj_Log2 0.0004520332
## hs_trcs_cadj_Log2 .
## hs_mbzp_cadj_Log2 0.0467602173
## hs_mecpp_cadj_Log2 .
## hs_mehhp_cadj_Log2 .
## hs_mehp_cadj_Log2 .
## hs_meohp_cadj_Log2 .
## hs_mep_cadj_Log2 .
## hs_mibp_cadj_Log2 -0.0313108890
## hs_mnbp_cadj_Log2 -0.0123108272
## hs_ohminp_cadj_Log2 .
## hs_oxominp_cadj_Log2 .
## FAS_cat_None .
## hs_contactfam_3cat_num_None .
## hs_hm_pers_None -0.0056381866
## hs_participation_3cat_None .
## hs_cotinine_cdich_None .
## hs_globalexp2_None .
## hs_smk_parents_None .
cat("Model with Covariates - Test MSE:", mse_with_covariates, "\n")
## Model with Covariates - Test MSE: 1.173885
cat("Model without Covariates - Test MSE:", mse_without_covariates, "\n")
## Model without Covariates - Test MSE: 1.203556
# RIDGE
fit_with_covariates <- cv.glmnet(x_full_train, y_train, alpha = 0, family = "gaussian")
predictions_with_covariates <- predict(fit_with_covariates, s = "lambda.min", newx = x_full_test)
mse_with_covariates <- mean((y_test - predictions_with_covariates)^2)
fit_without_covariates <- cv.glmnet(x_combined_train, y_train, alpha = 0, family = "gaussian")
predictions_without_covariates <- predict(fit_without_covariates, s = "lambda.min", newx = x_combined_test)
mse_without_covariates <- mean((y_test - predictions_without_covariates)^2)
plot(fit_with_covariates, xvar = "lambda", main = "Coefficient Path (With Covariates)")
plot(fit_without_covariates, xvar = "lambda", main = "Coefficient Path (Without Covariates)")
best_lambda <- fit_with_covariates$lambda.min # lambda that minimizes the MSE
coef(fit_with_covariates, s = best_lambda) # coefficients at the chosen lambda
## 113 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -3.5861814106
## h_bfdur_Ter(0,10.8] -0.0779257475
## h_bfdur_Ter(10.8,34.9] 0.0008585306
## h_bfdur_Ter(34.9,Inf] 0.0763578871
## hs_bakery_prod_Ter(2,6] -0.0232612180
## hs_bakery_prod_Ter(6,Inf] -0.1594193125
## hs_beverages_Ter(0.132,1] 0.0028931119
## hs_beverages_Ter(1,Inf] -0.0324969479
## hs_break_cer_Ter(1.1,5.5] 0.0038074159
## hs_break_cer_Ter(5.5,Inf] -0.0550061557
## hs_caff_drink_Ter(0.132,Inf] 0.0203377311
## hs_dairy_Ter(14.6,25.6] 0.0400994878
## hs_dairy_Ter(25.6,Inf] -0.0012457772
## hs_fastfood_Ter(0.132,0.5] 0.0601484486
## hs_fastfood_Ter(0.5,Inf] -0.0272351131
## h_legume_preg_Ter(0.5,2] 0.0184597362
## h_legume_preg_Ter(2,Inf] -0.0512443355
## hs_org_food_Ter(0.132,1] 0.0299325601
## hs_org_food_Ter(1,Inf] -0.0448649605
## hs_proc_meat_Ter(1.5,4] 0.0006243976
## hs_proc_meat_Ter(4,Inf] -0.0194999904
## hs_readymade_Ter(0.132,0.5] 0.0246995881
## hs_readymade_Ter(0.5,Inf] 0.0632237345
## hs_total_bread_Ter(7,17.5] -0.0690927270
## hs_total_bread_Ter(17.5,Inf] 0.0165408959
## hs_total_cereal_Ter(14.1,23.6] 0.0029703870
## hs_total_cereal_Ter(23.6,Inf] 0.0273109910
## hs_total_fish_Ter(1.5,3] -0.0498949858
## hs_total_fish_Ter(3,Inf] -0.0090873302
## hs_total_fruits_Ter(7,14.1] 0.0336802077
## hs_total_fruits_Ter(14.1,Inf] -0.0368074311
## hs_total_lipids_Ter(3,7] -0.0018898328
## hs_total_lipids_Ter(7,Inf] -0.0606809337
## hs_total_meat_Ter(6,9] 0.0124691415
## hs_total_meat_Ter(9,Inf] -0.0071145855
## hs_total_potatoes_Ter(3,4] 0.0377787307
## hs_total_potatoes_Ter(4,Inf] -0.0090633482
## hs_total_sweets_Ter(4.1,8.5] -0.0671772450
## hs_total_sweets_Ter(8.5,Inf] 0.0004650383
## hs_total_veg_Ter(6,8.5] 0.0055347534
## hs_total_veg_Ter(8.5,Inf] -0.0337870870
## hs_total_yog_Ter(6,8.5] -0.0213665643
## hs_total_yog_Ter(8.5,Inf] -0.0303531094
## hs_as_c_Log2 0.0040435851
## hs_cd_c_Log2 -0.0324347369
## hs_co_c_Log2 -0.0425353065
## hs_cs_c_Log2 0.1221782900
## hs_cu_c_Log2 0.5402150025
## hs_hg_c_Log2 -0.0229618178
## hs_mn_c_Log2 0.0050341797
## hs_mo_c_Log2 -0.0847200038
## hs_pb_c_Log2 -0.0293800014
## hs_tl_cdich_None .
## hs_dde_cadj_Log2 -0.0454641675
## hs_ddt_cadj_Log2 0.0044273954
## hs_hcb_cadj_Log2 -0.0540715945
## hs_pcb118_cadj_Log2 0.0079995568
## hs_pcb138_cadj_Log2 -0.0569704526
## hs_pcb153_cadj_Log2 -0.1358711722
## hs_pcb170_cadj_Log2 -0.0434809137
## hs_pcb180_cadj_Log2 -0.0256847965
## hs_dep_cadj_Log2 -0.0168233284
## hs_detp_cadj_Log2 0.0053640467
## hs_dmdtp_cdich_None .
## hs_dmp_cadj_Log2 -0.0017685990
## hs_dmtp_cadj_Log2 -0.0001932927
## hs_pbde153_cadj_Log2 -0.0272900235
## hs_pbde47_cadj_Log2 0.0089523624
## hs_pfhxs_c_Log2 -0.0141314890
## hs_pfna_c_Log2 -0.0052906747
## hs_pfoa_c_Log2 -0.1143166079
## hs_pfos_c_Log2 -0.0297311540
## hs_pfunda_c_Log2 0.0099738360
## hs_bpa_cadj_Log2 -0.0065661658
## hs_bupa_cadj_Log2 0.0032086202
## hs_etpa_cadj_Log2 -0.0038006820
## hs_mepa_cadj_Log2 -0.0125845244
## hs_oxbe_cadj_Log2 0.0072125426
## hs_prpa_cadj_Log2 0.0053787515
## hs_trcs_cadj_Log2 0.0034326693
## hs_mbzp_cadj_Log2 0.0446334946
## hs_mecpp_cadj_Log2 -0.0045884088
## hs_mehhp_cadj_Log2 0.0017001779
## hs_mehp_cadj_Log2 -0.0042613382
## hs_meohp_cadj_Log2 0.0010516738
## hs_mep_cadj_Log2 0.0052513081
## hs_mibp_cadj_Log2 -0.0287179511
## hs_mnbp_cadj_Log2 -0.0274492557
## hs_ohminp_cadj_Log2 -0.0229716236
## hs_oxominp_cadj_Log2 0.0093025927
## FAS_cat_None .
## hs_contactfam_3cat_num_None .
## hs_hm_pers_None -0.0161682014
## hs_participation_3cat_None .
## hs_cotinine_cdich_None .
## hs_globalexp2_None .
## hs_smk_parents_None .
## e3_sex_Nonefemale -0.0792980597
## e3_sex_Nonemale 0.0792766913
## e3_yearbir_None2004 -0.1104559510
## e3_yearbir_None2005 0.0544534231
## e3_yearbir_None2006 -0.0140080942
## e3_yearbir_None2007 0.0036495133
## e3_yearbir_None2008 0.0250943044
## e3_yearbir_None2009 0.0019301269
## h_edumc_None2 0.0276842155
## h_edumc_None3 0.0264802460
## h_cohort2 -0.1038841854
## h_cohort3 0.2164127464
## h_cohort4 0.2072617618
## h_cohort5 -0.0489596800
## h_cohort6 0.0728739983
## hs_child_age_None -0.0096555759
best_lambda <- fit_without_covariates$lambda.min # lambda that minimizes the MSE
coef(fit_without_covariates, s = best_lambda)
## 97 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -3.7182292825
## h_bfdur_Ter(0,10.8] -0.0860797524
## h_bfdur_Ter(10.8,34.9] 0.0157690142
## h_bfdur_Ter(34.9,Inf] 0.0738891964
## hs_bakery_prod_Ter(2,6] -0.0004921922
## hs_bakery_prod_Ter(6,Inf] -0.1550947799
## hs_beverages_Ter(0.132,1] 0.0056005297
## hs_beverages_Ter(1,Inf] -0.0294498010
## hs_break_cer_Ter(1.1,5.5] 0.0041625456
## hs_break_cer_Ter(5.5,Inf] -0.0496482050
## hs_caff_drink_Ter(0.132,Inf] 0.0132074628
## hs_dairy_Ter(14.6,25.6] 0.0383328785
## hs_dairy_Ter(25.6,Inf] -0.0156932735
## hs_fastfood_Ter(0.132,0.5] 0.0656064842
## hs_fastfood_Ter(0.5,Inf] -0.0279027967
## h_legume_preg_Ter(0.5,2] 0.0518026147
## h_legume_preg_Ter(2,Inf] -0.0515681232
## hs_org_food_Ter(0.132,1] 0.0293115265
## hs_org_food_Ter(1,Inf] -0.0475840779
## hs_proc_meat_Ter(1.5,4] 0.0053011903
## hs_proc_meat_Ter(4,Inf] -0.0118136041
## hs_readymade_Ter(0.132,0.5] 0.0286721450
## hs_readymade_Ter(0.5,Inf] 0.0580188327
## hs_total_bread_Ter(7,17.5] -0.0530077370
## hs_total_bread_Ter(17.5,Inf] 0.0120103801
## hs_total_cereal_Ter(14.1,23.6] 0.0012320991
## hs_total_cereal_Ter(23.6,Inf] 0.0156516499
## hs_total_fish_Ter(1.5,3] -0.0666413614
## hs_total_fish_Ter(3,Inf] 0.0083079643
## hs_total_fruits_Ter(7,14.1] 0.0321951554
## hs_total_fruits_Ter(14.1,Inf] -0.0418966990
## hs_total_lipids_Ter(3,7] -0.0108587395
## hs_total_lipids_Ter(7,Inf] -0.0779530877
## hs_total_meat_Ter(6,9] 0.0174718115
## hs_total_meat_Ter(9,Inf] 0.0060858362
## hs_total_potatoes_Ter(3,4] 0.0526652537
## hs_total_potatoes_Ter(4,Inf] -0.0086984587
## hs_total_sweets_Ter(4.1,8.5] -0.0702429607
## hs_total_sweets_Ter(8.5,Inf] -0.0035306199
## hs_total_veg_Ter(6,8.5] 0.0041934127
## hs_total_veg_Ter(8.5,Inf] -0.0551622414
## hs_total_yog_Ter(6,8.5] -0.0205709488
## hs_total_yog_Ter(8.5,Inf] -0.0350202977
## hs_as_c_Log2 0.0046556915
## hs_cd_c_Log2 -0.0345175260
## hs_co_c_Log2 -0.0408949090
## hs_cs_c_Log2 0.0841409453
## hs_cu_c_Log2 0.5332936404
## hs_hg_c_Log2 -0.0262284450
## hs_mn_c_Log2 -0.0160626952
## hs_mo_c_Log2 -0.0835014660
## hs_pb_c_Log2 -0.0220059774
## hs_tl_cdich_None .
## hs_dde_cadj_Log2 -0.0362905790
## hs_ddt_cadj_Log2 0.0037526990
## hs_hcb_cadj_Log2 -0.0310188009
## hs_pcb118_cadj_Log2 0.0073013802
## hs_pcb138_cadj_Log2 -0.0532103641
## hs_pcb153_cadj_Log2 -0.1242985374
## hs_pcb170_cadj_Log2 -0.0419218945
## hs_pcb180_cadj_Log2 -0.0242896915
## hs_dep_cadj_Log2 -0.0188114284
## hs_detp_cadj_Log2 0.0054139973
## hs_dmdtp_cdich_None .
## hs_dmp_cadj_Log2 -0.0025809251
## hs_dmtp_cadj_Log2 0.0009658477
## hs_pbde153_cadj_Log2 -0.0275122930
## hs_pbde47_cadj_Log2 0.0058804616
## hs_pfhxs_c_Log2 -0.0297015899
## hs_pfna_c_Log2 -0.0065309195
## hs_pfoa_c_Log2 -0.1074314041
## hs_pfos_c_Log2 -0.0474644980
## hs_pfunda_c_Log2 0.0066413412
## hs_bpa_cadj_Log2 -0.0070499072
## hs_bupa_cadj_Log2 0.0037281508
## hs_etpa_cadj_Log2 -0.0047185955
## hs_mepa_cadj_Log2 -0.0097560575
## hs_oxbe_cadj_Log2 0.0091813336
## hs_prpa_cadj_Log2 0.0062288642
## hs_trcs_cadj_Log2 0.0059294563
## hs_mbzp_cadj_Log2 0.0412654953
## hs_mecpp_cadj_Log2 0.0062824833
## hs_mehhp_cadj_Log2 0.0125842196
## hs_mehp_cadj_Log2 -0.0053456689
## hs_meohp_cadj_Log2 0.0091034417
## hs_mep_cadj_Log2 0.0064257842
## hs_mibp_cadj_Log2 -0.0341283015
## hs_mnbp_cadj_Log2 -0.0329934970
## hs_ohminp_cadj_Log2 -0.0208420975
## hs_oxominp_cadj_Log2 0.0114099235
## FAS_cat_None .
## hs_contactfam_3cat_num_None .
## hs_hm_pers_None -0.0228004897
## hs_participation_3cat_None .
## hs_cotinine_cdich_None .
## hs_globalexp2_None .
## hs_smk_parents_None .
cat("Model with Covariates - Test MSE:", mse_with_covariates, "\n")
## Model with Covariates - Test MSE: 1.123193
cat("Model without Covariates - Test MSE:", mse_without_covariates, "\n")
## Model without Covariates - Test MSE: 1.155814
# ELASTIC NET
fit_with_covariates <- cv.glmnet(x_full_train, y_train, alpha = 0.5, family = "gaussian")
predictions_with_covariates <- predict(fit_with_covariates, s = "lambda.min", newx = x_full_test)
mse_with_covariates <- mean((y_test - predictions_with_covariates)^2)
fit_without_covariates <- cv.glmnet(x_combined_train, y_train, alpha = 0.5, family = "gaussian")
predictions_without_covariates <- predict(fit_without_covariates, s = "lambda.min", newx = x_combined_test)
mse_without_covariates <- mean((y_test - predictions_without_covariates)^2)
plot(fit_with_covariates, xvar = "lambda", main = "Coefficient Path (With Covariates)")
plot(fit_without_covariates, xvar = "lambda", main = "Coefficient Path (Without Covariates)")
best_lambda <- fit_with_covariates$lambda.min # lambda that minimizes the MSE
coef(fit_with_covariates, s = best_lambda) # coefficients at the chosen lambda
## 113 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -5.258333400
## h_bfdur_Ter(0,10.8] -0.082442759
## h_bfdur_Ter(10.8,34.9] .
## h_bfdur_Ter(34.9,Inf] 0.079982963
## hs_bakery_prod_Ter(2,6] .
## hs_bakery_prod_Ter(6,Inf] -0.201616093
## hs_beverages_Ter(0.132,1] .
## hs_beverages_Ter(1,Inf] .
## hs_break_cer_Ter(1.1,5.5] .
## hs_break_cer_Ter(5.5,Inf] .
## hs_caff_drink_Ter(0.132,Inf] .
## hs_dairy_Ter(14.6,25.6] 0.005203095
## hs_dairy_Ter(25.6,Inf] .
## hs_fastfood_Ter(0.132,0.5] 0.059880567
## hs_fastfood_Ter(0.5,Inf] .
## h_legume_preg_Ter(0.5,2] .
## h_legume_preg_Ter(2,Inf] -0.056124414
## hs_org_food_Ter(0.132,1] 0.005083293
## hs_org_food_Ter(1,Inf] .
## hs_proc_meat_Ter(1.5,4] .
## hs_proc_meat_Ter(4,Inf] .
## hs_readymade_Ter(0.132,0.5] .
## hs_readymade_Ter(0.5,Inf] 0.015984706
## hs_total_bread_Ter(7,17.5] -0.050862711
## hs_total_bread_Ter(17.5,Inf] .
## hs_total_cereal_Ter(14.1,23.6] .
## hs_total_cereal_Ter(23.6,Inf] .
## hs_total_fish_Ter(1.5,3] .
## hs_total_fish_Ter(3,Inf] .
## hs_total_fruits_Ter(7,14.1] 0.002674839
## hs_total_fruits_Ter(14.1,Inf] -0.014334189
## hs_total_lipids_Ter(3,7] .
## hs_total_lipids_Ter(7,Inf] -0.026344671
## hs_total_meat_Ter(6,9] .
## hs_total_meat_Ter(9,Inf] .
## hs_total_potatoes_Ter(3,4] 0.009514919
## hs_total_potatoes_Ter(4,Inf] .
## hs_total_sweets_Ter(4.1,8.5] -0.028662773
## hs_total_sweets_Ter(8.5,Inf] .
## hs_total_veg_Ter(6,8.5] .
## hs_total_veg_Ter(8.5,Inf] -0.004184539
## hs_total_yog_Ter(6,8.5] .
## hs_total_yog_Ter(8.5,Inf] .
## hs_as_c_Log2 .
## hs_cd_c_Log2 -0.016093261
## hs_co_c_Log2 -0.003072533
## hs_cs_c_Log2 0.146384847
## hs_cu_c_Log2 0.695195647
## hs_hg_c_Log2 -0.021688344
## hs_mn_c_Log2 .
## hs_mo_c_Log2 -0.093103311
## hs_pb_c_Log2 -0.022060040
## hs_tl_cdich_None .
## hs_dde_cadj_Log2 -0.043119699
## hs_ddt_cadj_Log2 .
## hs_hcb_cadj_Log2 -0.020701577
## hs_pcb118_cadj_Log2 .
## hs_pcb138_cadj_Log2 .
## hs_pcb153_cadj_Log2 -0.279263147
## hs_pcb170_cadj_Log2 -0.056595859
## hs_pcb180_cadj_Log2 .
## hs_dep_cadj_Log2 -0.016828799
## hs_detp_cadj_Log2 .
## hs_dmdtp_cdich_None .
## hs_dmp_cadj_Log2 .
## hs_dmtp_cadj_Log2 .
## hs_pbde153_cadj_Log2 -0.032258855
## hs_pbde47_cadj_Log2 .
## hs_pfhxs_c_Log2 .
## hs_pfna_c_Log2 .
## hs_pfoa_c_Log2 -0.129555577
## hs_pfos_c_Log2 .
## hs_pfunda_c_Log2 .
## hs_bpa_cadj_Log2 .
## hs_bupa_cadj_Log2 .
## hs_etpa_cadj_Log2 .
## hs_mepa_cadj_Log2 -0.006719327
## hs_oxbe_cadj_Log2 .
## hs_prpa_cadj_Log2 .
## hs_trcs_cadj_Log2 .
## hs_mbzp_cadj_Log2 0.046066744
## hs_mecpp_cadj_Log2 .
## hs_mehhp_cadj_Log2 .
## hs_mehp_cadj_Log2 .
## hs_meohp_cadj_Log2 .
## hs_mep_cadj_Log2 .
## hs_mibp_cadj_Log2 -0.028742762
## hs_mnbp_cadj_Log2 -0.002308112
## hs_ohminp_cadj_Log2 -0.002398690
## hs_oxominp_cadj_Log2 .
## FAS_cat_None .
## hs_contactfam_3cat_num_None .
## hs_hm_pers_None .
## hs_participation_3cat_None .
## hs_cotinine_cdich_None .
## hs_globalexp2_None .
## hs_smk_parents_None .
## e3_sex_Nonefemale -0.073233783
## e3_sex_Nonemale 0.065324705
## e3_yearbir_None2004 -0.095696031
## e3_yearbir_None2005 .
## e3_yearbir_None2006 .
## e3_yearbir_None2007 .
## e3_yearbir_None2008 .
## e3_yearbir_None2009 .
## h_edumc_None2 .
## h_edumc_None3 .
## h_cohort2 -0.074809073
## h_cohort3 0.391964861
## h_cohort4 0.310207397
## h_cohort5 .
## h_cohort6 0.073899674
## hs_child_age_None .
best_lambda <- fit_without_covariates$lambda.min # lambda that minimizes the MSE
coef(fit_without_covariates, s = best_lambda)
## 97 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -5.1133010038
## h_bfdur_Ter(0,10.8] -0.1257812987
## h_bfdur_Ter(10.8,34.9] .
## h_bfdur_Ter(34.9,Inf] 0.0098695900
## hs_bakery_prod_Ter(2,6] .
## hs_bakery_prod_Ter(6,Inf] -0.2052707951
## hs_beverages_Ter(0.132,1] .
## hs_beverages_Ter(1,Inf] .
## hs_break_cer_Ter(1.1,5.5] .
## hs_break_cer_Ter(5.5,Inf] .
## hs_caff_drink_Ter(0.132,Inf] .
## hs_dairy_Ter(14.6,25.6] 0.0083014798
## hs_dairy_Ter(25.6,Inf] .
## hs_fastfood_Ter(0.132,0.5] 0.0723166735
## hs_fastfood_Ter(0.5,Inf] .
## h_legume_preg_Ter(0.5,2] 0.0101366309
## h_legume_preg_Ter(2,Inf] -0.0853205645
## hs_org_food_Ter(0.132,1] .
## hs_org_food_Ter(1,Inf] -0.0004524184
## hs_proc_meat_Ter(1.5,4] .
## hs_proc_meat_Ter(4,Inf] .
## hs_readymade_Ter(0.132,0.5] .
## hs_readymade_Ter(0.5,Inf] 0.0083640118
## hs_total_bread_Ter(7,17.5] -0.0123257688
## hs_total_bread_Ter(17.5,Inf] .
## hs_total_cereal_Ter(14.1,23.6] .
## hs_total_cereal_Ter(23.6,Inf] .
## hs_total_fish_Ter(1.5,3] -0.0285577491
## hs_total_fish_Ter(3,Inf] .
## hs_total_fruits_Ter(7,14.1] .
## hs_total_fruits_Ter(14.1,Inf] -0.0228116118
## hs_total_lipids_Ter(3,7] .
## hs_total_lipids_Ter(7,Inf] -0.0485418209
## hs_total_meat_Ter(6,9] .
## hs_total_meat_Ter(9,Inf] .
## hs_total_potatoes_Ter(3,4] 0.0168351467
## hs_total_potatoes_Ter(4,Inf] .
## hs_total_sweets_Ter(4.1,8.5] -0.0198599950
## hs_total_sweets_Ter(8.5,Inf] .
## hs_total_veg_Ter(6,8.5] .
## hs_total_veg_Ter(8.5,Inf] -0.0429851064
## hs_total_yog_Ter(6,8.5] .
## hs_total_yog_Ter(8.5,Inf] .
## hs_as_c_Log2 .
## hs_cd_c_Log2 -0.0258899345
## hs_co_c_Log2 -0.0092126979
## hs_cs_c_Log2 0.0676393049
## hs_cu_c_Log2 0.6693489253
## hs_hg_c_Log2 -0.0152313991
## hs_mn_c_Log2 .
## hs_mo_c_Log2 -0.0992111766
## hs_pb_c_Log2 .
## hs_tl_cdich_None .
## hs_dde_cadj_Log2 -0.0299730899
## hs_ddt_cadj_Log2 .
## hs_hcb_cadj_Log2 .
## hs_pcb118_cadj_Log2 .
## hs_pcb138_cadj_Log2 .
## hs_pcb153_cadj_Log2 -0.2311607073
## hs_pcb170_cadj_Log2 -0.0546601693
## hs_pcb180_cadj_Log2 .
## hs_dep_cadj_Log2 -0.0188961844
## hs_detp_cadj_Log2 .
## hs_dmdtp_cdich_None .
## hs_dmp_cadj_Log2 .
## hs_dmtp_cadj_Log2 .
## hs_pbde153_cadj_Log2 -0.0351087073
## hs_pbde47_cadj_Log2 .
## hs_pfhxs_c_Log2 -0.0017727290
## hs_pfna_c_Log2 .
## hs_pfoa_c_Log2 -0.1215693401
## hs_pfos_c_Log2 -0.0511509209
## hs_pfunda_c_Log2 .
## hs_bpa_cadj_Log2 .
## hs_bupa_cadj_Log2 .
## hs_etpa_cadj_Log2 .
## hs_mepa_cadj_Log2 .
## hs_oxbe_cadj_Log2 .
## hs_prpa_cadj_Log2 0.0003657529
## hs_trcs_cadj_Log2 .
## hs_mbzp_cadj_Log2 0.0440420234
## hs_mecpp_cadj_Log2 .
## hs_mehhp_cadj_Log2 .
## hs_mehp_cadj_Log2 .
## hs_meohp_cadj_Log2 .
## hs_mep_cadj_Log2 .
## hs_mibp_cadj_Log2 -0.0300733513
## hs_mnbp_cadj_Log2 -0.0122025518
## hs_ohminp_cadj_Log2 .
## hs_oxominp_cadj_Log2 .
## FAS_cat_None .
## hs_contactfam_3cat_num_None .
## hs_hm_pers_None -0.0055372885
## hs_participation_3cat_None .
## hs_cotinine_cdich_None .
## hs_globalexp2_None .
## hs_smk_parents_None .
cat("Model with Covariates - Test MSE:", mse_with_covariates, "\n")
## Model with Covariates - Test MSE: 1.155518
cat("Model without Covariates - Test MSE:", mse_without_covariates, "\n")
## Model without Covariates - Test MSE: 1.200805
Trying to figure out what to do for : * cluster individuals and the individuals would be characterized by having high, medium, or low exposure (discussed last week)
set.seed(101)
num_chemical_diet <- ncol(x_chemical_train) + ncol(x_diet_train)
num_covariates <- ncol(x_covariates_train)
# make sure these add up to the number of columns in x_full_train
if((num_chemical_diet + num_covariates) != ncol(x_full_train)) {
cat("Mismatch in expected column counts\n")
}
# define groups
groups <- c(rep(1, num_chemical_diet), rep(2, num_covariates))
# make sure all columns are numeric
x_full_train <- data.frame(x_full_train)
x_full_train[] <- lapply(x_full_train, function(x) {
if(is.factor(x) || is.character(x)) {
as.numeric(as.factor(x))
} else {
x
}
})
# to ensure all variables numeric
if(any(sapply(x_full_train, function(x) !is.numeric(x)))) {
stop("Some columns are still not numeric.")
}
model_group_lasso <- grpreg(x_full_train, y_train, group = groups, penalty = "grLasso")
# x_full_test as numeric
x_full_test <- data.frame(x_full_test)
x_full_test[] <- lapply(x_full_test, function(x) {
if(is.factor(x) || is.character(x)) {
as.numeric(as.factor(x))
} else {
x
}
})
x_scaled <- scale(x_full_train)
wss <- sapply(1:15, function(k) {
kmeans(x_scaled, centers = k, nstart = 20)$tot.withinss
})
plot(1:15, wss, type = "b", pch = 19, frame = FALSE,
xlab = "Number of clusters K", ylab = "Total within-clusters sum of squares")
#k-means clustering with the determined number of clusters
k <- which.min(diff(diff(wss))) + 1
km <- kmeans(x_scaled, centers = k, nstart = 25)
# plot cluster assignment
clusplot(x_scaled, km$cluster, color=TRUE, shade=TRUE,
labels=2, lines=0, main=paste("K-means Clustering with", k, "clusters"))
x_full_train[] <- lapply(x_full_train, function(x) if(is.character(x)) factor(x) else x)
x_full_test[] <- lapply(x_full_test, function(x) if(is.character(x)) factor(x) else x)
rf_model <- randomForest(x_full_train, y_train, ntree=500, importance=TRUE)
importance(rf_model)
## %IncMSE IncNodePurity
## h_bfdur_Ter.0.10.8. 1.62394425 2.6472740
## h_bfdur_Ter.10.8.34.9. 1.44728015 2.3187936
## h_bfdur_Ter.34.9.Inf. 0.42661331 2.1787180
## hs_bakery_prod_Ter.2.6. 1.42322939 2.2183957
## hs_bakery_prod_Ter.6.Inf. 3.66856059 3.6033007
## hs_beverages_Ter.0.132.1. 0.30441134 1.5262195
## hs_beverages_Ter.1.Inf. -0.05648233 1.9449951
## hs_break_cer_Ter.1.1.5.5. -0.49893844 1.4411459
## hs_break_cer_Ter.5.5.Inf. 1.58834132 3.6238282
## hs_caff_drink_Ter.0.132.Inf. 2.88031712 1.5850729
## hs_dairy_Ter.14.6.25.6. -0.94873273 1.7573655
## hs_dairy_Ter.25.6.Inf. -0.11199066 1.4038995
## hs_fastfood_Ter.0.132.0.5. -0.73851208 2.4200607
## hs_fastfood_Ter.0.5.Inf. -1.08793318 1.8398958
## h_legume_preg_Ter.0.5.2. 1.02913452 1.5388965
## h_legume_preg_Ter.2.Inf. 2.69014804 2.5865749
## hs_org_food_Ter.0.132.1. 1.36221797 1.8403823
## hs_org_food_Ter.1.Inf. -0.46573537 1.7616331
## hs_proc_meat_Ter.1.5.4. -0.45678186 1.7658075
## hs_proc_meat_Ter.4.Inf. -0.88430858 1.1803357
## hs_readymade_Ter.0.132.0.5. -0.12606145 1.6163844
## hs_readymade_Ter.0.5.Inf. -0.45608826 1.9371251
## hs_total_bread_Ter.7.17.5. 0.15442273 1.5456688
## hs_total_bread_Ter.17.5.Inf. 0.10735063 2.1585239
## hs_total_cereal_Ter.14.1.23.6. -0.66474298 1.3968275
## hs_total_cereal_Ter.23.6.Inf. -0.33065996 1.5798432
## hs_total_fish_Ter.1.5.3. 0.89901191 1.7980006
## hs_total_fish_Ter.3.Inf. 0.03453010 1.4685165
## hs_total_fruits_Ter.7.14.1. -1.01236808 2.2180663
## hs_total_fruits_Ter.14.1.Inf. -1.17879943 1.8142755
## hs_total_lipids_Ter.3.7. 0.99576403 1.6845186
## hs_total_lipids_Ter.7.Inf. 0.90327761 1.8433470
## hs_total_meat_Ter.6.9. -0.39508369 1.2298493
## hs_total_meat_Ter.9.Inf. -1.14930812 1.4839761
## hs_total_potatoes_Ter.3.4. -0.02879509 1.9667421
## hs_total_potatoes_Ter.4.Inf. 0.69057433 1.9766205
## hs_total_sweets_Ter.4.1.8.5. 0.83164986 1.9166902
## hs_total_sweets_Ter.8.5.Inf. 0.05657777 1.4126819
## hs_total_veg_Ter.6.8.5. -0.91608707 1.4140262
## hs_total_veg_Ter.8.5.Inf. -0.10598934 2.3876899
## hs_total_yog_Ter.6.8.5. 1.13542005 1.4888589
## hs_total_yog_Ter.8.5.Inf. 1.10424924 0.7261897
## hs_as_c_Log2 1.64815691 16.6466486
## hs_cd_c_Log2 2.31257107 22.1929226
## hs_co_c_Log2 0.41498301 18.4353751
## hs_cs_c_Log2 1.17594185 18.8005754
## hs_cu_c_Log2 4.18415526 32.5438363
## hs_hg_c_Log2 2.73198756 20.6874755
## hs_mn_c_Log2 0.66078482 17.2305211
## hs_mo_c_Log2 1.85743031 25.5068009
## hs_pb_c_Log2 1.02771155 17.8076216
## hs_tl_cdich_None 0.93838727 1.3112074
## hs_dde_cadj_Log2 9.38861168 30.7654268
## hs_ddt_cadj_Log2 4.63649126 25.9097505
## hs_hcb_cadj_Log2 15.53697838 71.9377305
## hs_pcb118_cadj_Log2 5.09181167 24.8770565
## hs_pcb138_cadj_Log2 11.27674444 46.5834800
## hs_pcb153_cadj_Log2 11.15837325 46.9819668
## hs_pcb170_cadj_Log2 12.85250894 69.7921122
## hs_pcb180_cadj_Log2 8.57654050 30.4415970
## hs_dep_cadj_Log2 0.10535720 19.5425789
## hs_detp_cadj_Log2 0.27224855 22.2342239
## hs_dmdtp_cdich_None -0.18119932 1.3452171
## hs_dmp_cadj_Log2 1.00411247 19.4120449
## hs_dmtp_cadj_Log2 -0.09412237 18.6242483
## hs_pbde153_cadj_Log2 8.61230311 57.2199924
## hs_pbde47_cadj_Log2 -0.49914633 19.0883815
## hs_pfhxs_c_Log2 1.19268773 18.8772269
## hs_pfna_c_Log2 5.31504852 21.6291066
## hs_pfoa_c_Log2 1.70950496 27.4875166
## hs_pfos_c_Log2 4.97902014 28.7867780
## hs_pfunda_c_Log2 0.31877488 17.4486484
## hs_bpa_cadj_Log2 0.28300198 16.1689031
## hs_bupa_cadj_Log2 -0.90177003 21.4241574
## hs_etpa_cadj_Log2 0.12822626 17.8602032
## hs_mepa_cadj_Log2 -0.51241444 16.8508187
## hs_oxbe_cadj_Log2 1.20889453 21.7818671
## hs_prpa_cadj_Log2 -0.21090768 15.7609955
## hs_trcs_cadj_Log2 3.16260000 17.8193371
## hs_mbzp_cadj_Log2 0.58725651 21.1306131
## hs_mecpp_cadj_Log2 1.59100813 13.8922651
## hs_mehhp_cadj_Log2 2.32575192 14.0681085
## hs_mehp_cadj_Log2 0.32123863 14.8664145
## hs_meohp_cadj_Log2 1.34347705 12.8261532
## hs_mep_cadj_Log2 1.46043932 15.4653443
## hs_mibp_cadj_Log2 0.27877928 16.0575165
## hs_mnbp_cadj_Log2 0.33600919 19.2808726
## hs_ohminp_cadj_Log2 5.05913120 23.5103948
## hs_oxominp_cadj_Log2 -0.46482796 17.1762792
## FAS_cat_None 0.21983082 2.5953622
## hs_contactfam_3cat_num_None -0.74059254 2.6902674
## hs_hm_pers_None -0.70098893 4.9503056
## hs_participation_3cat_None 1.12391217 4.9868047
## hs_cotinine_cdich_None 1.96479354 5.2452927
## hs_globalexp2_None 1.92498116 2.3745794
## hs_smk_parents_None 0.66801139 5.1112738
## e3_sex_Nonefemale 1.76864622 2.1742532
## e3_sex_Nonemale 0.95394065 1.6816199
## e3_yearbir_None2004 0.44368971 1.1798504
## e3_yearbir_None2005 1.52565863 1.4685544
## e3_yearbir_None2006 1.27113726 0.8881484
## e3_yearbir_None2007 -0.01228062 1.0683998
## e3_yearbir_None2008 -0.42831674 1.3756556
## e3_yearbir_None2009 0.50666864 0.6236582
## h_edumc_None2 -0.05935642 2.0579226
## h_edumc_None3 0.37341740 1.5026138
## h_cohort2 1.97963049 1.3363036
## h_cohort3 7.92754915 14.7384558
## h_cohort4 5.37343930 4.2456046
## h_cohort5 3.01847986 0.4778185
## h_cohort6 2.26940445 2.5738336
## hs_child_age_None 6.07973673 22.4898732
varImpPlot(rf_model)
# predict on the test set
predictions_rf <- predict(rf_model, x_full_test)
mse_rf <- mean((y_test - predictions_rf)^2)
cat("Random Forest Test MSE:", mse_rf, "\n")
## Random Forest Test MSE: 1.194692
gbm_model <- gbm(y_train ~ ., data = x_full_train,
distribution = "gaussian",
n.trees = 1000,
interaction.depth = 3,
n.minobsinnode = 10,
shrinkage = 0.01,
cv.folds = 5,
verbose = TRUE)
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.4346 nan 0.0100 0.0028
## 2 1.4305 nan 0.0100 0.0029
## 3 1.4259 nan 0.0100 0.0036
## 4 1.4218 nan 0.0100 0.0036
## 5 1.4183 nan 0.0100 0.0024
## 6 1.4145 nan 0.0100 0.0030
## 7 1.4099 nan 0.0100 0.0035
## 8 1.4057 nan 0.0100 0.0030
## 9 1.4022 nan 0.0100 0.0027
## 10 1.3986 nan 0.0100 0.0028
## 20 1.3637 nan 0.0100 0.0024
## 40 1.3041 nan 0.0100 0.0004
## 60 1.2531 nan 0.0100 0.0008
## 80 1.2104 nan 0.0100 0.0010
## 100 1.1750 nan 0.0100 0.0006
## 120 1.1460 nan 0.0100 0.0002
## 140 1.1186 nan 0.0100 0.0006
## 160 1.0944 nan 0.0100 0.0006
## 180 1.0722 nan 0.0100 0.0003
## 200 1.0516 nan 0.0100 -0.0003
## 220 1.0319 nan 0.0100 0.0004
## 240 1.0141 nan 0.0100 -0.0002
## 260 0.9978 nan 0.0100 -0.0002
## 280 0.9806 nan 0.0100 -0.0001
## 300 0.9661 nan 0.0100 -0.0001
## 320 0.9524 nan 0.0100 -0.0001
## 340 0.9381 nan 0.0100 -0.0001
## 360 0.9257 nan 0.0100 -0.0001
## 380 0.9130 nan 0.0100 -0.0001
## 400 0.9009 nan 0.0100 -0.0001
## 420 0.8893 nan 0.0100 -0.0005
## 440 0.8781 nan 0.0100 -0.0003
## 460 0.8674 nan 0.0100 -0.0006
## 480 0.8562 nan 0.0100 -0.0003
## 500 0.8460 nan 0.0100 -0.0003
## 520 0.8351 nan 0.0100 0.0000
## 540 0.8247 nan 0.0100 -0.0000
## 560 0.8150 nan 0.0100 -0.0003
## 580 0.8052 nan 0.0100 -0.0002
## 600 0.7954 nan 0.0100 -0.0001
## 620 0.7864 nan 0.0100 -0.0001
## 640 0.7778 nan 0.0100 -0.0004
## 660 0.7692 nan 0.0100 -0.0007
## 680 0.7612 nan 0.0100 -0.0004
## 700 0.7528 nan 0.0100 -0.0003
## 720 0.7446 nan 0.0100 -0.0001
## 740 0.7368 nan 0.0100 -0.0002
## 760 0.7292 nan 0.0100 -0.0002
## 780 0.7213 nan 0.0100 -0.0005
## 800 0.7138 nan 0.0100 -0.0001
## 820 0.7059 nan 0.0100 -0.0002
## 840 0.6992 nan 0.0100 -0.0004
## 860 0.6926 nan 0.0100 -0.0003
## 880 0.6858 nan 0.0100 -0.0003
## 900 0.6788 nan 0.0100 -0.0003
## 920 0.6724 nan 0.0100 -0.0002
## 940 0.6661 nan 0.0100 -0.0003
## 960 0.6598 nan 0.0100 -0.0002
## 980 0.6541 nan 0.0100 -0.0001
## 1000 0.6478 nan 0.0100 -0.0002
predictions_gbm <- predict(gbm_model, x_full_test, n.trees = 1000, type = "response")
mse_gbm <- mean((y_test - predictions_gbm)^2)
cat("GBM Test MSE:", mse_gbm, "\n")
## GBM Test MSE: 1.122145
summary(gbm_model)
First 10 rows and columns of the metabolomic serum data
load("/Users/allison/Library/CloudStorage/GoogleDrive-aflouie@usc.edu/My Drive/HELIX_data/metabol_serum.RData")
kable(metabol_serum.d[1:10,1:10], align="c", digits=2, format="pipe")
| 430 | 1187 | 940 | 936 | 788 | 698 | 380 | 196 | 114 | 885 | |
|---|---|---|---|---|---|---|---|---|---|---|
| metab_1 | -2.15 | -0.69 | -0.69 | -0.19 | -1.96 | -1.90 | -0.22 | -1.38 | -0.54 | -1.25 |
| metab_2 | -0.71 | -0.37 | -0.36 | -0.34 | -0.35 | -0.63 | -0.26 | -0.46 | -0.44 | -0.48 |
| metab_3 | 8.60 | 9.15 | 8.95 | 8.54 | 8.73 | 8.24 | 9.03 | 8.29 | 8.37 | 8.18 |
| metab_4 | 0.55 | -1.33 | -0.13 | -0.62 | -0.80 | -0.46 | 0.49 | 0.12 | -0.76 | -0.07 |
| metab_5 | 7.05 | 6.89 | 7.10 | 7.01 | 6.90 | 6.94 | 6.77 | 6.62 | 6.85 | 7.24 |
| metab_6 | 5.79 | 5.81 | 5.86 | 5.95 | 5.95 | 5.42 | 5.82 | 5.65 | 5.44 | 5.60 |
| metab_7 | 3.75 | 4.26 | 4.35 | 4.24 | 4.88 | 4.70 | 4.08 | 4.73 | 3.98 | 4.30 |
| metab_8 | 5.07 | 5.08 | 5.92 | 5.41 | 5.39 | 4.62 | 5.10 | 5.28 | 4.51 | 5.45 |
| metab_9 | -1.87 | -2.30 | -1.97 | -1.89 | -1.55 | -1.78 | -2.29 | -1.64 | -2.02 | -1.68 |
| metab_10 | -2.77 | -3.42 | -3.40 | -2.84 | -2.45 | -3.14 | -3.36 | -2.88 | -3.05 | -2.92 |
metabol_serum_transposed <- as.data.frame(t(metabol_serum.d))
metabol_serum_transposed$ID <- as.integer(rownames(metabol_serum_transposed))
# Add the ID column to the first position
metabol_serum_transposed <- metabol_serum_transposed[, c("ID", setdiff(names(metabol_serum_transposed), "ID"))]
# Now, the ID is the first column, and the layout is preserved
kable(head(metabol_serum_transposed), align = "c", digits = 2, format = "pipe")
| ID | metab_1 | metab_2 | metab_3 | metab_4 | metab_5 | metab_6 | metab_7 | metab_8 | metab_9 | metab_10 | metab_11 | metab_12 | metab_13 | metab_14 | metab_15 | metab_16 | metab_17 | metab_18 | metab_19 | metab_20 | metab_21 | metab_22 | metab_23 | metab_24 | metab_25 | metab_26 | metab_27 | metab_28 | metab_29 | metab_30 | metab_31 | metab_32 | metab_33 | metab_34 | metab_35 | metab_36 | metab_37 | metab_38 | metab_39 | metab_40 | metab_41 | metab_42 | metab_43 | metab_44 | metab_45 | metab_46 | metab_47 | metab_48 | metab_49 | metab_50 | metab_51 | metab_52 | metab_53 | metab_54 | metab_55 | metab_56 | metab_57 | metab_58 | metab_59 | metab_60 | metab_61 | metab_62 | metab_63 | metab_64 | metab_65 | metab_66 | metab_67 | metab_68 | metab_69 | metab_70 | metab_71 | metab_72 | metab_73 | metab_74 | metab_75 | metab_76 | metab_77 | metab_78 | metab_79 | metab_80 | metab_81 | metab_82 | metab_83 | metab_84 | metab_85 | metab_86 | metab_87 | metab_88 | metab_89 | metab_90 | metab_91 | metab_92 | metab_93 | metab_94 | metab_95 | metab_96 | metab_97 | metab_98 | metab_99 | metab_100 | metab_101 | metab_102 | metab_103 | metab_104 | metab_105 | metab_106 | metab_107 | metab_108 | metab_109 | metab_110 | metab_111 | metab_112 | metab_113 | metab_114 | metab_115 | metab_116 | metab_117 | metab_118 | metab_119 | metab_120 | metab_121 | metab_122 | metab_123 | metab_124 | metab_125 | metab_126 | metab_127 | metab_128 | metab_129 | metab_130 | metab_131 | metab_132 | metab_133 | metab_134 | metab_135 | metab_136 | metab_137 | metab_138 | metab_139 | metab_140 | metab_141 | metab_142 | metab_143 | metab_144 | metab_145 | metab_146 | metab_147 | metab_148 | metab_149 | metab_150 | metab_151 | metab_152 | metab_153 | metab_154 | metab_155 | metab_156 | metab_157 | metab_158 | metab_159 | metab_160 | metab_161 | metab_162 | metab_163 | metab_164 | metab_165 | metab_166 | metab_167 | metab_168 | metab_169 | metab_170 | metab_171 | metab_172 | metab_173 | metab_174 | metab_175 | metab_176 | metab_177 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 430 | 430 | -2.15 | -0.71 | 8.60 | 0.55 | 7.05 | 5.79 | 3.75 | 5.07 | -1.87 | -2.77 | -3.31 | -2.91 | -2.94 | -1.82 | -4.40 | -4.10 | -5.41 | -5.13 | -5.35 | -3.39 | -5.08 | -6.06 | -6.06 | -4.99 | -4.46 | -4.63 | -3.27 | -4.61 | 2.17 | -1.73 | -4.97 | -4.90 | -2.63 | -5.29 | -2.38 | -4.06 | -5.11 | -5.35 | -4.80 | -3.92 | -3.92 | -5.47 | -4.22 | -2.56 | -3.93 | 5.15 | 6.03 | 10.20 | 5.14 | 7.82 | 12.31 | 7.27 | 7.08 | 1.79 | 7.73 | 7.98 | 1.96 | 6.15 | 0.98 | 0.60 | 4.42 | 4.36 | 5.85 | 1.03 | 2.74 | -2.53 | -2.05 | -2.91 | -1.61 | -1.63 | 5.03 | 0.14 | 6.23 | -2.95 | 1.29 | 1.70 | -2.83 | 4.55 | 4.05 | 2.56 | -0.29 | 8.33 | 9.93 | 4.89 | 1.28 | 2.16 | 5.82 | 8.95 | 7.72 | 8.41 | 4.71 | 0.10 | 2.02 | 0.16 | 5.82 | 7.45 | 6.17 | 6.81 | -0.70 | -1.25 | -0.65 | 2.05 | 3.39 | 4.94 | -0.69 | -1.44 | -2.06 | -2.44 | -1.30 | -0.73 | -1.52 | -2.43 | -3.26 | 1.97 | 0.03 | 1.09 | 3.98 | 4.56 | 4.16 | 0.42 | 3.48 | 4.88 | 3.84 | 4.70 | 4.04 | 1.58 | -0.76 | 1.75 | 2.48 | 4.43 | 4.68 | 3.29 | 0.97 | 1.03 | 0.44 | 1.55 | 2.26 | 2.72 | 0.12 | -0.90 | -0.50 | 0.02 | -0.18 | 1.02 | -2.69 | -1.66 | 0.47 | 0.28 | 6.75 | 7.67 | -2.66 | -1.52 | 7.28 | -0.08 | 2.39 | 1.55 | 3.01 | 2.92 | -0.48 | 6.78 | 3.90 | 4.05 | 3.17 | -1.46 | 3.56 | 4.60 | -3.55 | -2.79 | -1.98 | -1.84 | 3.98 | 6.47 | 7.16 | -0.01 | 6.57 | 6.86 | 8.36 |
| 1187 | 1187 | -0.69 | -0.37 | 9.15 | -1.33 | 6.89 | 5.81 | 4.26 | 5.08 | -2.30 | -3.42 | -3.63 | -3.16 | -3.22 | -1.57 | -4.10 | -5.35 | -5.68 | -6.11 | -5.54 | -3.50 | -5.24 | -5.72 | -5.97 | -4.94 | -4.25 | -4.46 | -3.55 | -4.64 | 1.81 | -2.92 | -4.44 | -4.49 | -3.53 | -4.94 | -3.15 | -4.13 | -4.47 | -4.90 | -4.24 | -3.49 | -3.94 | -4.99 | -4.02 | -2.69 | -3.69 | 5.13 | 5.57 | 9.93 | 6.13 | 8.47 | 12.32 | 6.83 | 5.94 | 1.64 | 6.82 | 7.74 | 1.98 | 6.11 | 0.99 | 0.19 | 4.34 | 4.36 | 5.47 | 0.92 | 2.69 | -2.69 | -1.93 | -2.79 | -1.63 | -1.69 | 4.58 | 0.41 | 6.14 | -3.06 | 1.05 | 2.10 | -2.95 | 4.51 | 4.30 | 2.57 | 0.08 | 8.27 | 9.54 | 4.61 | 1.39 | 1.91 | 5.91 | 8.59 | 7.34 | 8.04 | 4.29 | -0.04 | 2.17 | 0.42 | 5.39 | 6.95 | 5.68 | 6.09 | -0.68 | -1.29 | -0.76 | 1.84 | 3.06 | 4.40 | -0.52 | -1.52 | -1.90 | -2.44 | -1.46 | -1.00 | -1.33 | -2.41 | -3.67 | 2.48 | 0.27 | 1.02 | 4.19 | 4.43 | 4.19 | 0.33 | 3.24 | 4.38 | 3.92 | 5.09 | 4.42 | 1.01 | -0.53 | 1.36 | 2.25 | 4.54 | 5.10 | 3.45 | 0.65 | 0.83 | 0.36 | 1.68 | 2.56 | 2.70 | 0.02 | -1.02 | -0.93 | -0.22 | 0.11 | 1.60 | -2.70 | -1.31 | 1.08 | 0.54 | 6.29 | 7.97 | -3.22 | -1.34 | 7.50 | 0.48 | 2.19 | 1.49 | 3.09 | 2.71 | -0.38 | 6.86 | 3.77 | 4.31 | 3.23 | -1.82 | 3.80 | 5.05 | -3.31 | -2.18 | -2.21 | -2.01 | 4.91 | 6.84 | 7.14 | 0.14 | 6.03 | 6.55 | 7.91 |
| 940 | 940 | -0.69 | -0.36 | 8.95 | -0.13 | 7.10 | 5.86 | 4.35 | 5.92 | -1.97 | -3.40 | -3.41 | -2.99 | -3.01 | -1.65 | -3.55 | -4.82 | -5.41 | -5.84 | -5.13 | -2.83 | -4.86 | -5.51 | -5.51 | -4.63 | -3.73 | -4.00 | -2.92 | -4.21 | 2.79 | -1.41 | -4.80 | -5.47 | -2.10 | -5.47 | -2.14 | -4.18 | -4.84 | -5.24 | -4.64 | -3.20 | -3.90 | -5.24 | -3.77 | -2.70 | -2.76 | 5.21 | 5.86 | 9.78 | 6.38 | 8.29 | 12.49 | 7.01 | 6.49 | 1.97 | 7.17 | 7.62 | 2.40 | 6.93 | 1.85 | 1.45 | 5.11 | 5.30 | 6.27 | 2.35 | 3.31 | -2.50 | -1.41 | -2.61 | -0.93 | -1.03 | 4.54 | 1.59 | 6.03 | -2.74 | 1.79 | 2.68 | -8.16 | 5.19 | 5.14 | 3.16 | 0.24 | 9.09 | 10.25 | 5.44 | 1.90 | 2.46 | 6.66 | 9.19 | 8.24 | 8.46 | 5.73 | 1.10 | 2.58 | 1.15 | 6.37 | 7.28 | 6.51 | 7.20 | -0.48 | -0.69 | -0.02 | 2.56 | 3.76 | 5.33 | -0.16 | -1.18 | -1.18 | -2.16 | -1.06 | -0.19 | -0.48 | -2.35 | -3.16 | 2.79 | 0.72 | 2.14 | 4.80 | 4.84 | 4.55 | 1.27 | 4.26 | 5.23 | 4.40 | 5.43 | 4.56 | 2.32 | 0.03 | 2.15 | 3.22 | 5.06 | 5.28 | 3.80 | 1.38 | 1.58 | 0.98 | 2.27 | 2.94 | 3.39 | 0.33 | -0.53 | 0.17 | 0.53 | 0.57 | 1.69 | -2.21 | -0.76 | 1.25 | 0.49 | 6.49 | 8.84 | -4.02 | -1.33 | 7.42 | 0.71 | 2.81 | 2.03 | 3.30 | 3.00 | -0.24 | 7.02 | 3.82 | 4.66 | 3.36 | -1.18 | 3.82 | 4.91 | -2.95 | -2.89 | -2.43 | -2.05 | 4.25 | 7.02 | 7.36 | 0.14 | 6.57 | 6.68 | 8.12 |
| 936 | 936 | -0.19 | -0.34 | 8.54 | -0.62 | 7.01 | 5.95 | 4.24 | 5.41 | -1.89 | -2.84 | -3.38 | -3.11 | -2.94 | -1.45 | -3.83 | -4.43 | -5.61 | -5.41 | -5.54 | -2.94 | -4.78 | -6.06 | -5.88 | -4.70 | -4.82 | -4.46 | -2.66 | -3.82 | 2.85 | -2.70 | -5.16 | -5.47 | -3.31 | -5.61 | -2.80 | -4.11 | -4.97 | -4.86 | -5.01 | -3.63 | -3.78 | -5.29 | -4.17 | -2.49 | -3.65 | 5.31 | 5.60 | 9.87 | 6.67 | 8.05 | 12.33 | 6.72 | 6.42 | 1.25 | 7.28 | 7.37 | 1.99 | 6.28 | 1.17 | 0.50 | 4.52 | 4.43 | 5.54 | 1.30 | 3.08 | -2.92 | -2.16 | -3.18 | -1.66 | -1.63 | 4.55 | 0.53 | 5.73 | -3.27 | 1.30 | 1.70 | -2.57 | 4.53 | 4.14 | 2.61 | -0.18 | 8.32 | 9.62 | 4.82 | 1.58 | 1.99 | 5.82 | 8.59 | 7.58 | 8.39 | 4.68 | 0.36 | 2.01 | -0.31 | 5.71 | 7.35 | 6.22 | 6.66 | -0.70 | -1.42 | -0.62 | 2.13 | 3.54 | 4.85 | -0.72 | -1.53 | -2.04 | -2.37 | -1.38 | -0.96 | -1.57 | -2.91 | -3.60 | 2.37 | 0.21 | 0.92 | 4.05 | 4.27 | 4.33 | 0.24 | 3.38 | 4.45 | 3.71 | 4.74 | 4.44 | 1.51 | -1.73 | 1.51 | 2.27 | 4.37 | 4.89 | 3.40 | 0.66 | 0.83 | 0.27 | 1.50 | 2.30 | 2.60 | 0.14 | -0.90 | -0.99 | -0.53 | -0.30 | 1.14 | -3.06 | -1.69 | 0.39 | 0.19 | 6.21 | 8.05 | -2.75 | -0.87 | 7.79 | 0.87 | 2.48 | 1.62 | 3.28 | 2.93 | -0.41 | 6.91 | 3.75 | 4.38 | 3.20 | -1.07 | 3.81 | 4.89 | -3.36 | -2.40 | -2.06 | -2.03 | 3.99 | 7.36 | 6.94 | 0.14 | 6.26 | 6.47 | 7.98 |
| 788 | 788 | -1.96 | -0.35 | 8.73 | -0.80 | 6.90 | 5.95 | 4.88 | 5.39 | -1.55 | -2.45 | -3.51 | -2.84 | -2.83 | -1.71 | -3.91 | -4.05 | -5.61 | -4.63 | -5.29 | -3.51 | -4.86 | -5.97 | -5.27 | -4.90 | -4.40 | -4.63 | -3.11 | -3.99 | 2.87 | -2.23 | -4.61 | -5.04 | -3.53 | -5.08 | -3.02 | -4.41 | -4.72 | -5.18 | -4.72 | -3.63 | -3.61 | -5.29 | -4.05 | -2.31 | -3.73 | 4.69 | 5.31 | 9.69 | 6.76 | 8.21 | 12.18 | 6.75 | 6.51 | 1.15 | 7.38 | 7.93 | 1.76 | 5.68 | -0.02 | -0.65 | 4.14 | 3.36 | 4.43 | 0.21 | 1.98 | -2.31 | -1.54 | -2.30 | -1.66 | -1.47 | 4.48 | 0.88 | 6.47 | -2.50 | 0.74 | 1.12 | -2.17 | 4.31 | 3.50 | 2.09 | -0.60 | 8.06 | 9.69 | 3.99 | 0.54 | 1.60 | 5.60 | 8.71 | 7.32 | 8.03 | 3.27 | -0.98 | 1.59 | -0.20 | 5.68 | 7.16 | 5.57 | 6.16 | -0.79 | -1.31 | -0.87 | 2.17 | 3.23 | 4.57 | -0.93 | -1.80 | -2.27 | -2.51 | -1.74 | -1.02 | -1.92 | -2.02 | -3.79 | 1.95 | -0.24 | 0.40 | 3.73 | 4.13 | 3.71 | 0.03 | 2.89 | 4.06 | 3.54 | 4.76 | 3.88 | 0.53 | -2.11 | 1.27 | 1.99 | 4.13 | 4.58 | 2.88 | 0.22 | 0.39 | 0.22 | 1.44 | 2.02 | 2.22 | 0.00 | -0.81 | -1.10 | -0.41 | -0.09 | 1.00 | -2.66 | -1.55 | 0.33 | 0.19 | 6.47 | 7.89 | -4.40 | -1.94 | 7.65 | 0.38 | 1.66 | 0.84 | 2.78 | 2.26 | -0.84 | 6.52 | 3.53 | 3.81 | 2.83 | -1.69 | 3.65 | 4.47 | -3.81 | -2.97 | -2.88 | -2.29 | 3.88 | 6.99 | 7.38 | -0.10 | 6.00 | 6.52 | 8.04 |
| 698 | 698 | -1.90 | -0.63 | 8.24 | -0.46 | 6.94 | 5.42 | 4.70 | 4.62 | -1.78 | -3.14 | -3.46 | -2.90 | -2.94 | -1.65 | -4.20 | -4.56 | -5.68 | -5.61 | -5.41 | -2.92 | -5.04 | -5.97 | -6.06 | -4.90 | -4.22 | -4.20 | -3.05 | -4.61 | 2.15 | -2.87 | -4.68 | -5.08 | -3.69 | -5.24 | -3.63 | -4.24 | -5.16 | -5.35 | -4.97 | -3.61 | -3.99 | -5.35 | -3.98 | -2.59 | -3.95 | 5.15 | 5.82 | 10.00 | 5.54 | 8.15 | 12.28 | 6.80 | 6.23 | 1.88 | 7.07 | 7.38 | 2.06 | 6.79 | 1.67 | 1.00 | 4.79 | 4.79 | 5.71 | 1.99 | 3.29 | -2.13 | -1.01 | -1.85 | -1.23 | -0.90 | 4.41 | -0.02 | 6.09 | -2.10 | 1.66 | 2.27 | -3.48 | 4.96 | 4.76 | 2.64 | 0.05 | 8.91 | 9.99 | 5.16 | 1.53 | 2.11 | 6.28 | 8.77 | 8.03 | 8.66 | 5.99 | 0.87 | 2.30 | 0.63 | 6.23 | 7.50 | 6.75 | 7.22 | -0.45 | -0.81 | -0.11 | 2.57 | 3.93 | 5.16 | -0.31 | -1.19 | -1.25 | -1.93 | -0.89 | 0.07 | -0.87 | -1.12 | -3.03 | 2.61 | 0.54 | 1.83 | 4.50 | 4.53 | 4.42 | 1.15 | 4.02 | 4.91 | 4.06 | 5.06 | 4.42 | 2.02 | -1.03 | 1.87 | 2.96 | 4.84 | 5.08 | 3.62 | 1.13 | 1.23 | 0.75 | 2.26 | 2.80 | 3.04 | 0.41 | -0.39 | 0.02 | 0.31 | 0.52 | 1.73 | -2.28 | -0.73 | 1.06 | 0.72 | 6.44 | 7.27 | -3.08 | -1.23 | 7.35 | 0.92 | 2.60 | 2.00 | 3.69 | 3.20 | -0.25 | 7.38 | 4.15 | 5.00 | 3.88 | -1.39 | 4.31 | 5.20 | -3.47 | -2.75 | -1.97 | -1.96 | 4.18 | 6.81 | 6.75 | 0.02 | 6.49 | 5.97 | 7.78 |
selected_metabolomics_data <- combined_data %>% dplyr::select(-c(ID))
#removing any NA, might be problematic but hard to impute completely
selected_metabolomics_data <- selected_metabolomics_data %>% na.omit()
set.seed(101)
trainIndex <- createDataPartition(selected_metabolomics_data$hs_zbmi_who, p = .7,
list = FALSE,
times = 1)
train_data <- selected_metabolomics_data[ trainIndex,]
test_data <- selected_metabolomics_data[-trainIndex,]
x_train <- model.matrix(hs_zbmi_who ~ ., train_data)[,-1]
y_train <- train_data$hs_zbmi_who
x_test <- model.matrix(hs_zbmi_who ~ ., test_data)[,-1]
y_test <- test_data$hs_zbmi_who
lasso_model <- cv.glmnet(x_train, y_train, alpha = 1, family = "gaussian")
plot(lasso_model)
lasso_model$lambda.min
## [1] 0.00693355
coef(lasso_model, s = lasso_model$lambda.min)
## 294 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) 11.9344434811
## hs_child_age_None -0.1333552616
## h_cohort2 -0.1496542270
## h_cohort3 0.3044563109
## h_cohort4 0.3321092436
## h_cohort5 .
## h_cohort6 .
## e3_sex_Nonemale 0.2914962364
## e3_yearbir_None2004 -0.1232062358
## e3_yearbir_None2005 -0.0600009953
## e3_yearbir_None2006 .
## e3_yearbir_None2007 0.0677303583
## e3_yearbir_None2008 .
## e3_yearbir_None2009 0.4443304231
## h_edumc_None2 0.0020404685
## h_edumc_None3 0.0548205075
## h_native_None1 .
## h_native_None2 0.0318678144
## hs_as_c_Log2 .
## hs_cd_c_Log2 0.0046892602
## hs_co_c_Log2 0.0026641062
## hs_cs_c_Log2 0.0844273798
## hs_cu_c_Log2 0.1562222585
## hs_hg_c_Log2 -0.0508921455
## hs_mn_c_Log2 -0.1603630035
## hs_mo_c_Log2 -0.0550542422
## hs_pb_c_Log2 .
## hs_tl_cdich_NoneUndetected .
## hs_dde_cadj_Log2 -0.0152351114
## hs_ddt_cadj_Log2 0.0019335326
## hs_hcb_cadj_Log2 -0.1496621452
## hs_pcb118_cadj_Log2 0.0691102100
## hs_pcb138_cadj_Log2 -0.1173108825
## hs_pcb153_cadj_Log2 -0.0740533391
## hs_pcb170_cadj_Log2 -0.0292899754
## hs_pcb180_cadj_Log2 -0.0160089398
## hs_dep_cadj_Log2 -0.0152892988
## hs_detp_cadj_Log2 0.0092683234
## hs_dmdtp_cdich_NoneUndetected .
## hs_dmp_cadj_Log2 -0.0042231171
## hs_dmtp_cadj_Log2 0.0097716060
## hs_pbde153_cadj_Log2 -0.0125551780
## hs_pbde47_cadj_Log2 .
## hs_pfhxs_c_Log2 0.0061603140
## hs_pfna_c_Log2 0.0120588162
## hs_pfoa_c_Log2 .
## hs_pfos_c_Log2 .
## hs_pfunda_c_Log2 .
## hs_bpa_cadj_Log2 -0.0353020063
## hs_bupa_cadj_Log2 .
## hs_etpa_cadj_Log2 -0.0020532156
## hs_mepa_cadj_Log2 -0.0157017242
## hs_oxbe_cadj_Log2 -0.0017632131
## hs_prpa_cadj_Log2 .
## hs_trcs_cadj_Log2 -0.0138764899
## hs_mbzp_cadj_Log2 0.0536927844
## hs_mecpp_cadj_Log2 .
## hs_mehhp_cadj_Log2 .
## hs_mehp_cadj_Log2 -0.0264868257
## hs_meohp_cadj_Log2 .
## hs_mep_cadj_Log2 -0.0157426652
## hs_mibp_cadj_Log2 0.0075714128
## hs_mnbp_cadj_Log2 .
## hs_ohminp_cadj_Log2 .
## hs_oxominp_cadj_Log2 0.0332438097
## FAS_cat_NoneMiddle 0.1014322045
## FAS_cat_NoneHigh 0.0167115030
## hs_contactfam_3cat_num_NoneOnce a week .
## hs_contactfam_3cat_num_NoneLess than once a week .
## hs_hm_pers_None 0.0015902201
## hs_participation_3cat_None1 organisation -0.1080493082
## hs_participation_3cat_None2 or more organisations -0.0308000236
## hs_cotinine_cdich_NoneUndetected .
## hs_globalexp2_Noneno exposure -0.0278623328
## hs_smk_parents_Noneneither .
## hs_smk_parents_Noneone .
## h_bfdur_Ter(10.8,34.9] 0.1015904168
## h_bfdur_Ter(34.9,Inf] 0.1073426922
## hs_bakery_prod_Ter(2,6] .
## hs_bakery_prod_Ter(6,Inf] -0.0940310022
## hs_beverages_Ter(0.132,1] 0.0156703517
## hs_beverages_Ter(1,Inf] -0.0389725921
## hs_break_cer_Ter(1.1,5.5] 0.0009913745
## hs_break_cer_Ter(5.5,Inf] .
## hs_caff_drink_Ter(0.132,Inf] .
## hs_dairy_Ter(14.6,25.6] .
## hs_dairy_Ter(25.6,Inf] 0.1237503599
## hs_fastfood_Ter(0.132,0.5] .
## hs_fastfood_Ter(0.5,Inf] -0.0065190649
## h_legume_preg_Ter(0.5,2] .
## h_legume_preg_Ter(2,Inf] .
## hs_org_food_Ter(0.132,1] 0.0263738630
## hs_org_food_Ter(1,Inf] 0.0089687080
## hs_proc_meat_Ter(1.5,4] 0.0706991488
## hs_proc_meat_Ter(4,Inf] .
## hs_readymade_Ter(0.132,0.5] 0.0743810914
## hs_readymade_Ter(0.5,Inf] 0.0526633081
## hs_total_bread_Ter(7,17.5] .
## hs_total_bread_Ter(17.5,Inf] .
## hs_total_cereal_Ter(14.1,23.6] -0.0052924501
## hs_total_cereal_Ter(23.6,Inf] -0.0261686307
## hs_total_fish_Ter(1.5,3] .
## hs_total_fish_Ter(3,Inf] -0.0191887588
## hs_total_fruits_Ter(7,14.1] 0.0995108081
## hs_total_fruits_Ter(14.1,Inf] 0.1444763278
## hs_total_lipids_Ter(3,7] 0.0649609851
## hs_total_lipids_Ter(7,Inf] .
## hs_total_meat_Ter(6,9] -0.0698846806
## hs_total_meat_Ter(9,Inf] 0.0494944824
## hs_total_potatoes_Ter(3,4] 0.0201324815
## hs_total_potatoes_Ter(4,Inf] .
## hs_total_sweets_Ter(4.1,8.5] -0.0131914826
## hs_total_sweets_Ter(8.5,Inf] .
## hs_total_veg_Ter(6,8.5] 0.0083023669
## hs_total_veg_Ter(8.5,Inf] .
## hs_total_yog_Ter(6,8.5] -0.1414862249
## hs_total_yog_Ter(8.5,Inf] -0.2103339718
## metab_1 -0.0161612291
## metab_2 0.0639087000
## metab_3 .
## metab_4 0.0353803791
## metab_5 0.5231521888
## metab_6 -0.0692928964
## metab_7 .
## metab_8 0.1662468669
## metab_9 .
## metab_10 0.0878429500
## metab_11 0.2098076802
## metab_12 -0.1734629049
## metab_13 .
## metab_14 -0.3799696854
## metab_15 .
## metab_16 .
## metab_17 .
## metab_18 -0.1864053887
## metab_19 .
## metab_20 .
## metab_21 .
## metab_22 -0.2686850695
## metab_23 0.0578179689
## metab_24 0.6948022269
## metab_25 -0.0681555661
## metab_26 -0.1533297717
## metab_27 0.4535185102
## metab_28 .
## metab_29 -0.0556735120
## metab_30 0.1520492956
## metab_31 0.0159236768
## metab_32 -0.1252442603
## metab_33 .
## metab_34 .
## metab_35 .
## metab_36 .
## metab_37 -0.0324673521
## metab_38 -0.0414669154
## metab_39 .
## metab_40 0.0885647709
## metab_41 0.2908769427
## metab_42 -0.4296851548
## metab_43 -0.1855903360
## metab_44 -0.0048542189
## metab_45 0.0995057474
## metab_46 -0.0026663585
## metab_47 0.4180859602
## metab_48 -0.8069685787
## metab_49 0.1303056697
## metab_50 -0.1779314339
## metab_51 .
## metab_52 0.4378234919
## metab_53 .
## metab_54 0.1710308286
## metab_55 .
## metab_56 -0.1011131936
## metab_57 .
## metab_58 .
## metab_59 0.5460263989
## metab_60 -0.0949858038
## metab_61 .
## metab_62 .
## metab_63 -0.1514962420
## metab_64 .
## metab_65 .
## metab_66 .
## metab_67 -0.2479786532
## metab_68 0.1324865931
## metab_69 -0.0137513134
## metab_70 .
## metab_71 -0.0208093223
## metab_72 .
## metab_73 -0.1394831346
## metab_74 .
## metab_75 0.2597833693
## metab_76 .
## metab_77 0.0159767953
## metab_78 -0.0548497413
## metab_79 .
## metab_80 .
## metab_81 .
## metab_82 -0.6225580602
## metab_83 .
## metab_84 -0.0915721643
## metab_85 .
## metab_86 0.3905176681
## metab_87 0.0441136906
## metab_88 0.4966852624
## metab_89 -1.2029998701
## metab_90 .
## metab_91 0.1198932694
## metab_92 0.0852711215
## metab_93 .
## metab_94 -0.0582548003
## metab_95 1.6313000805
## metab_96 0.0418905876
## metab_97 .
## metab_98 .
## metab_99 -0.4867234866
## metab_100 0.5619333748
## metab_101 .
## metab_102 .
## metab_103 -0.4768228828
## metab_104 0.1727697356
## metab_105 0.2274243586
## metab_106 .
## metab_107 0.0696204807
## metab_108 -0.1019120281
## metab_109 -0.2138253777
## metab_110 -0.1790634382
## metab_111 .
## metab_112 .
## metab_113 0.5674034616
## metab_114 .
## metab_115 0.4652000705
## metab_116 .
## metab_117 .
## metab_118 -0.3377228473
## metab_119 .
## metab_120 -0.3336407204
## metab_121 .
## metab_122 .
## metab_123 .
## metab_124 .
## metab_125 -0.1452415612
## metab_126 .
## metab_127 -0.0365590835
## metab_128 .
## metab_129 .
## metab_130 .
## metab_131 .
## metab_132 .
## metab_133 -0.3940659478
## metab_134 0.3674417619
## metab_135 -0.1984828231
## metab_136 .
## metab_137 -0.2793043247
## metab_138 -0.5050933616
## metab_139 .
## metab_140 .
## metab_141 .
## metab_142 -0.5506004047
## metab_143 -0.2325955157
## metab_144 .
## metab_145 -0.1878621998
## metab_146 .
## metab_147 0.3968109061
## metab_148 .
## metab_149 .
## metab_150 0.2473649353
## metab_151 .
## metab_152 -0.0480472668
## metab_153 .
## metab_154 -0.0215930501
## metab_155 -0.2741135855
## metab_156 .
## metab_157 0.1444878519
## metab_158 .
## metab_159 0.1277082260
## metab_160 -2.3349476269
## metab_161 2.5445557395
## metab_162 .
## metab_163 0.5630665285
## metab_164 -0.1211933641
## metab_165 .
## metab_166 -0.3005439845
## metab_167 -0.0951752455
## metab_168 -0.0014740884
## metab_169 .
## metab_170 -0.0158417720
## metab_171 -0.0926591230
## metab_172 .
## metab_173 .
## metab_174 .
## metab_175 -0.2688172334
## metab_176 -0.0425080580
## metab_177 0.0653934569
predictions <- predict(lasso_model, s = lasso_model$lambda.min, newx = x_test)
test_mse <- mean((predictions - y_test)^2)
cat("Mean Squared Error on Test Set:", test_mse, "\n")
## Mean Squared Error on Test Set: 0.76458
set.seed(101)
trainIndex <- createDataPartition(selected_metabolomics_data$hs_zbmi_who, p = .7,
list = FALSE,
times = 1)
train_data <- selected_metabolomics_data[ trainIndex,]
test_data <- selected_metabolomics_data[-trainIndex,]
x_train <- model.matrix(hs_zbmi_who ~ ., train_data)[,-1]
y_train <- train_data$hs_zbmi_who
x_test <- model.matrix(hs_zbmi_who ~ ., test_data)[,-1]
y_test <- test_data$hs_zbmi_who
ridge_model <- cv.glmnet(x_train, y_train, alpha = 0, family = "gaussian")
plot(ridge_model)
ridge_model$lambda.min
## [1] 0.1393112
coef(ridge_model, s = ridge_model$lambda.min)
## 294 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) 3.3089981515
## hs_child_age_None -0.0758812140
## h_cohort2 -0.2426750383
## h_cohort3 0.2076742195
## h_cohort4 0.2611899005
## h_cohort5 -0.0123322131
## h_cohort6 0.0148225642
## e3_sex_Nonemale 0.2118933820
## e3_yearbir_None2004 -0.1573272581
## e3_yearbir_None2005 -0.0809358242
## e3_yearbir_None2006 0.0333606705
## e3_yearbir_None2007 0.0954717699
## e3_yearbir_None2008 0.0388027677
## e3_yearbir_None2009 0.5581937203
## h_edumc_None2 0.0334816845
## h_edumc_None3 0.0840387720
## h_native_None1 -0.0383836005
## h_native_None2 0.0665177234
## hs_as_c_Log2 -0.0036290317
## hs_cd_c_Log2 0.0133594696
## hs_co_c_Log2 0.0146611792
## hs_cs_c_Log2 0.0896426341
## hs_cu_c_Log2 0.1914141905
## hs_hg_c_Log2 -0.0482853853
## hs_mn_c_Log2 -0.1417480861
## hs_mo_c_Log2 -0.0625678214
## hs_pb_c_Log2 -0.0134180068
## hs_tl_cdich_NoneUndetected -0.0107360272
## hs_dde_cadj_Log2 -0.0293003496
## hs_ddt_cadj_Log2 0.0033002891
## hs_hcb_cadj_Log2 -0.1442484778
## hs_pcb118_cadj_Log2 0.0696610668
## hs_pcb138_cadj_Log2 -0.0909239539
## hs_pcb153_cadj_Log2 -0.1045130094
## hs_pcb170_cadj_Log2 -0.0345846647
## hs_pcb180_cadj_Log2 -0.0181751784
## hs_dep_cadj_Log2 -0.0159138528
## hs_detp_cadj_Log2 0.0111003114
## hs_dmdtp_cdich_NoneUndetected 0.0071882679
## hs_dmp_cadj_Log2 -0.0057332145
## hs_dmtp_cadj_Log2 0.0129542729
## hs_pbde153_cadj_Log2 -0.0141856805
## hs_pbde47_cadj_Log2 -0.0003294178
## hs_pfhxs_c_Log2 0.0126776131
## hs_pfna_c_Log2 0.0171720119
## hs_pfoa_c_Log2 -0.0100373058
## hs_pfos_c_Log2 -0.0067173662
## hs_pfunda_c_Log2 -0.0047981683
## hs_bpa_cadj_Log2 -0.0412629741
## hs_bupa_cadj_Log2 -0.0001002913
## hs_etpa_cadj_Log2 -0.0026347191
## hs_mepa_cadj_Log2 -0.0202745438
## hs_oxbe_cadj_Log2 -0.0025832346
## hs_prpa_cadj_Log2 0.0039848865
## hs_trcs_cadj_Log2 -0.0153140235
## hs_mbzp_cadj_Log2 0.0509829559
## hs_mecpp_cadj_Log2 -0.0022019114
## hs_mehhp_cadj_Log2 -0.0093787384
## hs_mehp_cadj_Log2 -0.0209094483
## hs_meohp_cadj_Log2 -0.0132165148
## hs_mep_cadj_Log2 -0.0144548878
## hs_mibp_cadj_Log2 0.0262271198
## hs_mnbp_cadj_Log2 -0.0145955303
## hs_ohminp_cadj_Log2 -0.0010593644
## hs_oxominp_cadj_Log2 0.0463732822
## FAS_cat_NoneMiddle 0.1264202906
## FAS_cat_NoneHigh 0.0584125535
## hs_contactfam_3cat_num_NoneOnce a week -0.0084702170
## hs_contactfam_3cat_num_NoneLess than once a week -0.0394068552
## hs_hm_pers_None 0.0048287115
## hs_participation_3cat_None1 organisation -0.1388096284
## hs_participation_3cat_None2 or more organisations -0.0559814618
## hs_cotinine_cdich_NoneUndetected -0.0516985618
## hs_globalexp2_Noneno exposure -0.0373366092
## hs_smk_parents_Noneneither 0.0142323775
## hs_smk_parents_Noneone 0.0145620017
## h_bfdur_Ter(10.8,34.9] 0.1218319627
## h_bfdur_Ter(34.9,Inf] 0.1448195281
## hs_bakery_prod_Ter(2,6] 0.0152684351
## hs_bakery_prod_Ter(6,Inf] -0.1082447934
## hs_beverages_Ter(0.132,1] 0.0116773508
## hs_beverages_Ter(1,Inf] -0.0537775778
## hs_break_cer_Ter(1.1,5.5] 0.0347675734
## hs_break_cer_Ter(5.5,Inf] 0.0494987351
## hs_caff_drink_Ter(0.132,Inf] -0.0052055424
## hs_dairy_Ter(14.6,25.6] -0.0319959258
## hs_dairy_Ter(25.6,Inf] 0.1057593901
## hs_fastfood_Ter(0.132,0.5] -0.0184299757
## hs_fastfood_Ter(0.5,Inf] -0.0334658146
## h_legume_preg_Ter(0.5,2] -0.0228066186
## h_legume_preg_Ter(2,Inf] -0.0496898420
## hs_org_food_Ter(0.132,1] 0.0446865822
## hs_org_food_Ter(1,Inf] 0.0320793509
## hs_proc_meat_Ter(1.5,4] 0.0579793476
## hs_proc_meat_Ter(4,Inf] -0.0140020101
## hs_readymade_Ter(0.132,0.5] 0.1128738802
## hs_readymade_Ter(0.5,Inf] 0.0716067307
## hs_total_bread_Ter(7,17.5] 0.0074198676
## hs_total_bread_Ter(17.5,Inf] 0.0362514353
## hs_total_cereal_Ter(14.1,23.6] -0.0405249075
## hs_total_cereal_Ter(23.6,Inf] -0.0835241184
## hs_total_fish_Ter(1.5,3] -0.0160341616
## hs_total_fish_Ter(3,Inf] -0.0304189946
## hs_total_fruits_Ter(7,14.1] 0.1209089770
## hs_total_fruits_Ter(14.1,Inf] 0.1435003157
## hs_total_lipids_Ter(3,7] 0.0899136279
## hs_total_lipids_Ter(7,Inf] 0.0123260250
## hs_total_meat_Ter(6,9] -0.0878468700
## hs_total_meat_Ter(9,Inf] 0.0562160282
## hs_total_potatoes_Ter(3,4] 0.0419052347
## hs_total_potatoes_Ter(4,Inf] 0.0063978657
## hs_total_sweets_Ter(4.1,8.5] -0.0132130379
## hs_total_sweets_Ter(8.5,Inf] 0.0214976499
## hs_total_veg_Ter(6,8.5] 0.0251263489
## hs_total_veg_Ter(8.5,Inf] -0.0063443757
## hs_total_yog_Ter(6,8.5] -0.1244638717
## hs_total_yog_Ter(8.5,Inf] -0.1735410457
## metab_1 -0.0295213082
## metab_2 0.3195734320
## metab_3 0.0795853387
## metab_4 0.0288309508
## metab_5 0.4346177193
## metab_6 -0.1436012701
## metab_7 0.0459558548
## metab_8 0.3175861880
## metab_9 0.0152906095
## metab_10 0.0952122597
## metab_11 0.1694675707
## metab_12 -0.1660359079
## metab_13 -0.0038297823
## metab_14 -0.4970742096
## metab_15 0.0188682014
## metab_16 0.0231064608
## metab_17 -0.0235346972
## metab_18 -0.1695884023
## metab_19 -0.0371483323
## metab_20 0.0394122084
## metab_21 0.2105912253
## metab_22 -0.2610653180
## metab_23 0.0847677305
## metab_24 0.6619375568
## metab_25 -0.1089805262
## metab_26 -0.2207063688
## metab_27 0.3176588031
## metab_28 0.0680445057
## metab_29 -0.1153566419
## metab_30 0.1311635749
## metab_31 0.0338308730
## metab_32 -0.1043788271
## metab_33 -0.0208985538
## metab_34 -0.0213261670
## metab_35 0.0392658602
## metab_36 -0.0750942213
## metab_37 -0.1154602038
## metab_38 -0.0711691333
## metab_39 0.0155320687
## metab_40 0.3842504401
## metab_41 0.2699732505
## metab_42 -0.4211132608
## metab_43 -0.2544327475
## metab_44 -0.1003887020
## metab_45 0.1345853242
## metab_46 -0.1040905977
## metab_47 0.3877313287
## metab_48 -0.6204694495
## metab_49 0.1439882860
## metab_50 -0.2190410874
## metab_51 0.0936278718
## metab_52 0.4492993365
## metab_53 0.0472145681
## metab_54 0.1396824636
## metab_55 0.0212641147
## metab_56 -0.1519908835
## metab_57 0.1895795413
## metab_58 -0.1524379676
## metab_59 0.4142114253
## metab_60 -0.1238621846
## metab_61 0.0974977848
## metab_62 -0.0743434779
## metab_63 -0.1262671804
## metab_64 0.0557001612
## metab_65 0.0162441009
## metab_66 -0.0772017061
## metab_67 -0.1060707479
## metab_68 0.1226927869
## metab_69 -0.0714978228
## metab_70 -0.0354967992
## metab_71 -0.1025468347
## metab_72 -0.0011486058
## metab_73 -0.1124913789
## metab_74 0.0049302526
## metab_75 0.2712831558
## metab_76 -0.0409129940
## metab_77 0.0118715525
## metab_78 -0.2416737189
## metab_79 0.0248140704
## metab_80 0.0136979074
## metab_81 0.1464520439
## metab_82 -0.3784561456
## metab_83 -0.1160181990
## metab_84 -0.1843925941
## metab_85 -0.0115592395
## metab_86 0.2572779190
## metab_87 0.1196945818
## metab_88 0.3341508440
## metab_89 -0.3115081981
## metab_90 -0.0345660300
## metab_91 0.1231408378
## metab_92 0.1001063432
## metab_93 -0.0171792995
## metab_94 -0.0702095254
## metab_95 0.7246940973
## metab_96 0.3216368530
## metab_97 -0.1836736068
## metab_98 -0.0442240918
## metab_99 -0.4377534968
## metab_100 0.3516303988
## metab_101 0.0847176862
## metab_102 0.0743982233
## metab_103 -0.2241090916
## metab_104 0.2032091528
## metab_105 0.1636844792
## metab_106 0.0787586395
## metab_107 0.1542860823
## metab_108 -0.0778925253
## metab_109 -0.1632971637
## metab_110 -0.2522353816
## metab_111 -0.0577030260
## metab_112 0.0576370825
## metab_113 0.4726320826
## metab_114 0.0085569054
## metab_115 0.4018200815
## metab_116 0.0357272377
## metab_117 -0.2385287522
## metab_118 -0.1471657701
## metab_119 0.1070971184
## metab_120 -0.3566628093
## metab_121 0.0929328582
## metab_122 -0.2434188788
## metab_123 -0.1221182264
## metab_124 0.0402491838
## metab_125 -0.1615287101
## metab_126 0.0160595737
## metab_127 -0.0388707909
## metab_128 -0.0476813509
## metab_129 0.1293338868
## metab_130 -0.1427102001
## metab_131 -0.0175149369
## metab_132 0.0334583118
## metab_133 -0.3167027457
## metab_134 0.2446870827
## metab_135 -0.0930224192
## metab_136 -0.2050177897
## metab_137 -0.2643327704
## metab_138 -0.2879823660
## metab_139 0.0131976227
## metab_140 -0.0309655619
## metab_141 -0.0653123738
## metab_142 -0.3165277881
## metab_143 -0.2459000564
## metab_144 0.1067324105
## metab_145 -0.2741097101
## metab_146 -0.0123379020
## metab_147 0.2401296714
## metab_148 0.0293713012
## metab_149 0.0398798498
## metab_150 0.2082696815
## metab_151 0.0172807982
## metab_152 -0.0518151176
## metab_153 -0.1203378681
## metab_154 -0.0308513707
## metab_155 -0.2311355743
## metab_156 -0.0530828279
## metab_157 0.0893131660
## metab_158 0.1450322329
## metab_159 0.1858961771
## metab_160 -0.9603780180
## metab_161 1.2811254237
## metab_162 -0.0620154656
## metab_163 0.7268825358
## metab_164 -0.1176964721
## metab_165 -0.0256433248
## metab_166 -0.3012754912
## metab_167 -0.1031140369
## metab_168 -0.0743120699
## metab_169 0.0352504161
## metab_170 -0.0281957737
## metab_171 -0.0610543898
## metab_172 -0.0337957223
## metab_173 0.0555718488
## metab_174 -0.0666905814
## metab_175 -0.1952438136
## metab_176 -0.0779714629
## metab_177 0.1316563482
predictions <- predict(ridge_model, s = ridge_model$lambda.min, newx = x_test)
test_mse <- mean((predictions - y_test)^2)
cat("Mean Squared Error on Test Set:", test_mse, "\n")
## Mean Squared Error on Test Set: 0.7644475
set.seed(101)
trainIndex <- createDataPartition(selected_metabolomics_data$hs_zbmi_who, p = .7,
list = FALSE,
times = 1)
train_data <- selected_metabolomics_data[ trainIndex,]
test_data <- selected_metabolomics_data[-trainIndex,]
x_train <- model.matrix(hs_zbmi_who ~ ., train_data)[,-1]
y_train <- train_data$hs_zbmi_who
x_test <- model.matrix(hs_zbmi_who ~ ., test_data)[,-1]
y_test <- test_data$hs_zbmi_who
enet_model <- cv.glmnet(x_train, y_train, alpha = 0.5, family = "gaussian")
plot(enet_model)
enet_model$lambda.min
## [1] 0.01263519
coef(enet_model, s = enet_model$lambda.min)
## 294 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) 11.006140041
## hs_child_age_None -0.130439937
## h_cohort2 -0.163033682
## h_cohort3 0.298371433
## h_cohort4 0.333548803
## h_cohort5 .
## h_cohort6 .
## e3_sex_Nonemale 0.287219109
## e3_yearbir_None2004 -0.123982834
## e3_yearbir_None2005 -0.064281889
## e3_yearbir_None2006 .
## e3_yearbir_None2007 0.068365437
## e3_yearbir_None2008 .
## e3_yearbir_None2009 0.453512060
## h_edumc_None2 0.001280531
## h_edumc_None3 0.055154154
## h_native_None1 .
## h_native_None2 0.037131289
## hs_as_c_Log2 .
## hs_cd_c_Log2 0.004669753
## hs_co_c_Log2 0.006482867
## hs_cs_c_Log2 0.085646535
## hs_cu_c_Log2 0.161280919
## hs_hg_c_Log2 -0.051745069
## hs_mn_c_Log2 -0.160701347
## hs_mo_c_Log2 -0.056122932
## hs_pb_c_Log2 .
## hs_tl_cdich_NoneUndetected .
## hs_dde_cadj_Log2 -0.016576414
## hs_ddt_cadj_Log2 0.001755464
## hs_hcb_cadj_Log2 -0.153713609
## hs_pcb118_cadj_Log2 0.071582921
## hs_pcb138_cadj_Log2 -0.113550555
## hs_pcb153_cadj_Log2 -0.080358296
## hs_pcb170_cadj_Log2 -0.030024021
## hs_pcb180_cadj_Log2 -0.015707257
## hs_dep_cadj_Log2 -0.015711153
## hs_detp_cadj_Log2 0.009370132
## hs_dmdtp_cdich_NoneUndetected .
## hs_dmp_cadj_Log2 -0.004479767
## hs_dmtp_cadj_Log2 0.010633188
## hs_pbde153_cadj_Log2 -0.012527401
## hs_pbde47_cadj_Log2 .
## hs_pfhxs_c_Log2 0.008856441
## hs_pfna_c_Log2 0.011269716
## hs_pfoa_c_Log2 .
## hs_pfos_c_Log2 .
## hs_pfunda_c_Log2 .
## hs_bpa_cadj_Log2 -0.036539632
## hs_bupa_cadj_Log2 .
## hs_etpa_cadj_Log2 -0.001170266
## hs_mepa_cadj_Log2 -0.016009723
## hs_oxbe_cadj_Log2 -0.002083689
## hs_prpa_cadj_Log2 .
## hs_trcs_cadj_Log2 -0.014494152
## hs_mbzp_cadj_Log2 0.053615865
## hs_mecpp_cadj_Log2 .
## hs_mehhp_cadj_Log2 .
## hs_mehp_cadj_Log2 -0.028167338
## hs_meohp_cadj_Log2 .
## hs_mep_cadj_Log2 -0.015085677
## hs_mibp_cadj_Log2 0.008873337
## hs_mnbp_cadj_Log2 .
## hs_ohminp_cadj_Log2 .
## hs_oxominp_cadj_Log2 0.035695980
## FAS_cat_NoneMiddle 0.111626899
## FAS_cat_NoneHigh 0.029187361
## hs_contactfam_3cat_num_NoneOnce a week .
## hs_contactfam_3cat_num_NoneLess than once a week .
## hs_hm_pers_None 0.001477191
## hs_participation_3cat_None1 organisation -0.112152590
## hs_participation_3cat_None2 or more organisations -0.033605339
## hs_cotinine_cdich_NoneUndetected -0.002793010
## hs_globalexp2_Noneno exposure -0.028541213
## hs_smk_parents_Noneneither .
## hs_smk_parents_Noneone .
## h_bfdur_Ter(10.8,34.9] 0.103284228
## h_bfdur_Ter(34.9,Inf] 0.110185916
## hs_bakery_prod_Ter(2,6] .
## hs_bakery_prod_Ter(6,Inf] -0.094064386
## hs_beverages_Ter(0.132,1] 0.013981041
## hs_beverages_Ter(1,Inf] -0.043267301
## hs_break_cer_Ter(1.1,5.5] 0.001571217
## hs_break_cer_Ter(5.5,Inf] .
## hs_caff_drink_Ter(0.132,Inf] .
## hs_dairy_Ter(14.6,25.6] .
## hs_dairy_Ter(25.6,Inf] 0.127411570
## hs_fastfood_Ter(0.132,0.5] .
## hs_fastfood_Ter(0.5,Inf] -0.008334382
## h_legume_preg_Ter(0.5,2] .
## h_legume_preg_Ter(2,Inf] -0.002993820
## hs_org_food_Ter(0.132,1] 0.028415068
## hs_org_food_Ter(1,Inf] 0.012910554
## hs_proc_meat_Ter(1.5,4] 0.068888603
## hs_proc_meat_Ter(4,Inf] .
## hs_readymade_Ter(0.132,0.5] 0.078687908
## hs_readymade_Ter(0.5,Inf] 0.054527993
## hs_total_bread_Ter(7,17.5] .
## hs_total_bread_Ter(17.5,Inf] .
## hs_total_cereal_Ter(14.1,23.6] -0.006743963
## hs_total_cereal_Ter(23.6,Inf] -0.027185590
## hs_total_fish_Ter(1.5,3] .
## hs_total_fish_Ter(3,Inf] -0.020027760
## hs_total_fruits_Ter(7,14.1] 0.099749402
## hs_total_fruits_Ter(14.1,Inf] 0.142866343
## hs_total_lipids_Ter(3,7] 0.066450507
## hs_total_lipids_Ter(7,Inf] .
## hs_total_meat_Ter(6,9] -0.072888678
## hs_total_meat_Ter(9,Inf] 0.048682787
## hs_total_potatoes_Ter(3,4] 0.021291267
## hs_total_potatoes_Ter(4,Inf] .
## hs_total_sweets_Ter(4.1,8.5] -0.014390064
## hs_total_sweets_Ter(8.5,Inf] .
## hs_total_veg_Ter(6,8.5] 0.009430804
## hs_total_veg_Ter(8.5,Inf] .
## hs_total_yog_Ter(6,8.5] -0.140503743
## hs_total_yog_Ter(8.5,Inf] -0.209610929
## metab_1 -0.018410771
## metab_2 0.099036559
## metab_3 .
## metab_4 0.036380887
## metab_5 0.523343570
## metab_6 -0.074797573
## metab_7 .
## metab_8 0.187581279
## metab_9 .
## metab_10 0.099412468
## metab_11 0.206944220
## metab_12 -0.162271828
## metab_13 .
## metab_14 -0.397970264
## metab_15 .
## metab_16 .
## metab_17 .
## metab_18 -0.194820512
## metab_19 .
## metab_20 .
## metab_21 0.026501266
## metab_22 -0.277179631
## metab_23 0.061001467
## metab_24 0.693261373
## metab_25 -0.078513090
## metab_26 -0.163300595
## metab_27 0.456317231
## metab_28 .
## metab_29 -0.072604357
## metab_30 0.144824641
## metab_31 0.018718149
## metab_32 -0.126489323
## metab_33 .
## metab_34 .
## metab_35 .
## metab_36 .
## metab_37 -0.037179970
## metab_38 -0.041303026
## metab_39 .
## metab_40 0.131326981
## metab_41 0.295359586
## metab_42 -0.434480836
## metab_43 -0.197086826
## metab_44 -0.024152624
## metab_45 0.102824456
## metab_46 -0.019275421
## metab_47 0.423499726
## metab_48 -0.798963606
## metab_49 0.129970598
## metab_50 -0.169780977
## metab_51 .
## metab_52 0.445852439
## metab_53 .
## metab_54 0.164943984
## metab_55 .
## metab_56 -0.127503919
## metab_57 .
## metab_58 .
## metab_59 0.558234808
## metab_60 -0.115129334
## metab_61 .
## metab_62 .
## metab_63 -0.155494329
## metab_64 .
## metab_65 .
## metab_66 -0.023903239
## metab_67 -0.223317724
## metab_68 0.133007913
## metab_69 -0.030206838
## metab_70 .
## metab_71 -0.029599764
## metab_72 .
## metab_73 -0.138853661
## metab_74 .
## metab_75 0.289392883
## metab_76 .
## metab_77 0.015980730
## metab_78 -0.095641835
## metab_79 .
## metab_80 .
## metab_81 .
## metab_82 -0.646857224
## metab_83 .
## metab_84 -0.110262836
## metab_85 .
## metab_86 0.393908850
## metab_87 0.085559666
## metab_88 0.459848333
## metab_89 -1.089891018
## metab_90 .
## metab_91 0.118846688
## metab_92 0.092349686
## metab_93 .
## metab_94 -0.060607241
## metab_95 1.512549369
## metab_96 0.088504760
## metab_97 .
## metab_98 .
## metab_99 -0.509124675
## metab_100 0.540594691
## metab_101 .
## metab_102 .
## metab_103 -0.448602746
## metab_104 0.177794563
## metab_105 0.237934084
## metab_106 0.015657569
## metab_107 0.093934138
## metab_108 -0.086652898
## metab_109 -0.201468718
## metab_110 -0.199978263
## metab_111 .
## metab_112 .
## metab_113 0.584249016
## metab_114 .
## metab_115 0.463494145
## metab_116 .
## metab_117 .
## metab_118 -0.307858920
## metab_119 .
## metab_120 -0.344542306
## metab_121 .
## metab_122 .
## metab_123 -0.009022375
## metab_124 .
## metab_125 -0.158073341
## metab_126 .
## metab_127 -0.037277834
## metab_128 .
## metab_129 .
## metab_130 .
## metab_131 .
## metab_132 .
## metab_133 -0.410028240
## metab_134 0.380301071
## metab_135 -0.183420063
## metab_136 .
## metab_137 -0.329365844
## metab_138 -0.489857624
## metab_139 .
## metab_140 .
## metab_141 .
## metab_142 -0.525774668
## metab_143 -0.247629313
## metab_144 .
## metab_145 -0.207168640
## metab_146 .
## metab_147 0.389358798
## metab_148 .
## metab_149 .
## metab_150 0.244312340
## metab_151 .
## metab_152 -0.049537869
## metab_153 .
## metab_154 -0.021961373
## metab_155 -0.318887617
## metab_156 .
## metab_157 0.122614313
## metab_158 .
## metab_159 0.150331090
## metab_160 -2.130190016
## metab_161 2.404737771
## metab_162 .
## metab_163 0.613964063
## metab_164 -0.129795557
## metab_165 .
## metab_166 -0.332385450
## metab_167 -0.099342793
## metab_168 -0.013375959
## metab_169 .
## metab_170 -0.017874520
## metab_171 -0.093561008
## metab_172 .
## metab_173 .
## metab_174 .
## metab_175 -0.270360085
## metab_176 -0.040897127
## metab_177 0.095333463
predictions <- predict(enet_model, s = enet_model$lambda.min, newx = x_test)
test_mse <- mean((predictions - y_test)^2)
cat("Mean Squared Error on Test Set:", test_mse, "\n")
## Mean Squared Error on Test Set: 0.7618145
set.seed(101)
rf_model <- randomForest(hs_zbmi_who ~ ., data = train_data, ntree = 500)
rf_predictions <- predict(rf_model, newdata = test_data)
rf_mse <- mean((rf_predictions - y_test)^2)
cat("Random Forest Mean Squared Error on Test Set:", rf_mse, "\n")
## Random Forest Mean Squared Error on Test Set: 0.996276
importance(rf_model)
## IncNodePurity
## hs_child_age_None 3.7179863
## h_cohort 20.0678167
## e3_sex_None 0.4152372
## e3_yearbir_None 3.9912568
## h_edumc_None 0.8145915
## h_native_None 0.9930314
## hs_as_c_Log2 5.1882515
## hs_cd_c_Log2 4.5109001
## hs_co_c_Log2 4.1138086
## hs_cs_c_Log2 3.7660968
## hs_cu_c_Log2 11.6410604
## hs_hg_c_Log2 6.1211999
## hs_mn_c_Log2 4.3894712
## hs_mo_c_Log2 7.7337906
## hs_pb_c_Log2 5.8048984
## hs_tl_cdich_None 0.3460420
## hs_dde_cadj_Log2 9.6709540
## hs_ddt_cadj_Log2 4.6479697
## hs_hcb_cadj_Log2 81.7853196
## hs_pcb118_cadj_Log2 5.0669293
## hs_pcb138_cadj_Log2 15.1335648
## hs_pcb153_cadj_Log2 18.8308311
## hs_pcb170_cadj_Log2 53.0347502
## hs_pcb180_cadj_Log2 24.7612217
## hs_dep_cadj_Log2 4.8623860
## hs_detp_cadj_Log2 3.9848084
## hs_dmdtp_cdich_None 0.1633276
## hs_dmp_cadj_Log2 4.3354642
## hs_dmtp_cadj_Log2 3.1338208
## hs_pbde153_cadj_Log2 24.0975287
## hs_pbde47_cadj_Log2 5.2053622
## hs_pfhxs_c_Log2 5.2018150
## hs_pfna_c_Log2 4.2795388
## hs_pfoa_c_Log2 7.3415947
## hs_pfos_c_Log2 4.8793178
## hs_pfunda_c_Log2 3.9738862
## hs_bpa_cadj_Log2 3.2705516
## hs_bupa_cadj_Log2 4.1624466
## hs_etpa_cadj_Log2 4.0184711
## hs_mepa_cadj_Log2 4.6102107
## hs_oxbe_cadj_Log2 4.7227879
## hs_prpa_cadj_Log2 4.6000946
## hs_trcs_cadj_Log2 4.4038906
## hs_mbzp_cadj_Log2 4.4860652
## hs_mecpp_cadj_Log2 2.9958943
## hs_mehhp_cadj_Log2 2.9617891
## hs_mehp_cadj_Log2 3.7609006
## hs_meohp_cadj_Log2 3.3579843
## hs_mep_cadj_Log2 3.3191636
## hs_mibp_cadj_Log2 3.5163016
## hs_mnbp_cadj_Log2 3.5081694
## hs_ohminp_cadj_Log2 7.1113386
## hs_oxominp_cadj_Log2 4.8848057
## FAS_cat_None 1.1332639
## hs_contactfam_3cat_num_None 0.4729938
## hs_hm_pers_None 0.9827896
## hs_participation_3cat_None 1.1395005
## hs_cotinine_cdich_None 0.4033037
## hs_globalexp2_None 0.2212682
## hs_smk_parents_None 1.2282125
## h_bfdur_Ter 2.2523730
## hs_bakery_prod_Ter 1.6617517
## hs_beverages_Ter 0.8321733
## hs_break_cer_Ter 1.2573077
## hs_caff_drink_Ter 0.3830980
## hs_dairy_Ter 1.0584030
## hs_fastfood_Ter 0.6214335
## h_legume_preg_Ter 0.6275819
## hs_org_food_Ter 1.1564016
## hs_proc_meat_Ter 0.7252282
## hs_readymade_Ter 1.5786014
## hs_total_bread_Ter 0.9257676
## hs_total_cereal_Ter 0.8291572
## hs_total_fish_Ter 1.2014562
## hs_total_fruits_Ter 1.2313665
## hs_total_lipids_Ter 1.0080378
## hs_total_meat_Ter 1.1177749
## hs_total_potatoes_Ter 0.9514179
## hs_total_sweets_Ter 0.9739277
## hs_total_veg_Ter 1.3223020
## hs_total_yog_Ter 0.8120531
## metab_1 3.4103968
## metab_2 4.2519552
## metab_3 2.8253112
## metab_4 5.0328012
## metab_5 2.3199556
## metab_6 7.7226872
## metab_7 3.1136308
## metab_8 28.8473566
## metab_9 2.3992484
## metab_10 2.5682298
## metab_11 3.5363705
## metab_12 2.9445640
## metab_13 5.3822747
## metab_14 3.9698475
## metab_15 3.8123100
## metab_16 1.9134437
## metab_17 2.2871777
## metab_18 2.9810994
## metab_19 1.8741649
## metab_20 3.2159730
## metab_21 2.7049430
## metab_22 2.3935951
## metab_23 2.1264217
## metab_24 2.6925858
## metab_25 3.4997034
## metab_26 6.9049574
## metab_27 2.4665255
## metab_28 2.3559051
## metab_29 2.8692430
## metab_30 19.3559001
## metab_31 2.9967315
## metab_32 2.2831413
## metab_33 4.3849766
## metab_34 1.7226754
## metab_35 6.2043030
## metab_36 2.8524752
## metab_37 2.2554444
## metab_38 2.5925942
## metab_39 2.1880710
## metab_40 3.7960251
## metab_41 3.3313206
## metab_42 5.2761411
## metab_43 3.3846376
## metab_44 2.9932922
## metab_45 3.2500759
## metab_46 4.2604276
## metab_47 5.5495981
## metab_48 9.6501186
## metab_49 31.3854622
## metab_50 8.4607946
## metab_51 5.4544771
## metab_52 2.9014168
## metab_53 4.6450173
## metab_54 4.3782818
## metab_55 6.6757172
## metab_56 4.0731447
## metab_57 4.4282780
## metab_58 2.8378202
## metab_59 5.0114639
## metab_60 3.3753600
## metab_61 2.7162052
## metab_62 3.7035508
## metab_63 3.3138795
## metab_64 2.8957650
## metab_65 2.4842494
## metab_66 2.2621547
## metab_67 1.9474534
## metab_68 2.9192241
## metab_69 1.9739225
## metab_70 2.3863180
## metab_71 2.9557017
## metab_72 2.9690936
## metab_73 2.2610910
## metab_74 1.8242273
## metab_75 4.1514662
## metab_76 2.0297915
## metab_77 3.6949601
## metab_78 3.5339938
## metab_79 2.7201201
## metab_80 2.6306706
## metab_81 2.4352437
## metab_82 3.8473772
## metab_83 2.2366801
## metab_84 2.3773349
## metab_85 4.0974959
## metab_86 2.7145555
## metab_87 2.0444677
## metab_88 2.5801736
## metab_89 2.0778583
## metab_90 2.3225194
## metab_91 2.7620356
## metab_92 2.5208391
## metab_93 2.6031019
## metab_94 7.5697294
## metab_95 37.8195728
## metab_96 5.2325276
## metab_97 2.9415720
## metab_98 3.2182089
## metab_99 2.9630015
## metab_100 3.2915557
## metab_101 2.2303377
## metab_102 4.2271153
## metab_103 2.7239211
## metab_104 4.9548647
## metab_105 2.9039383
## metab_106 3.2112768
## metab_107 3.0155846
## metab_108 3.2761000
## metab_109 4.2307324
## metab_110 5.3342663
## metab_111 2.3230907
## metab_112 2.1428025
## metab_113 3.9232702
## metab_114 2.6065165
## metab_115 3.3635485
## metab_116 3.7914752
## metab_117 6.0709773
## metab_118 2.6173281
## metab_119 4.3248031
## metab_120 5.6256433
## metab_121 3.2107808
## metab_122 4.5539424
## metab_123 3.0342095
## metab_124 2.5387073
## metab_125 2.3113477
## metab_126 2.5368761
## metab_127 6.7322852
## metab_128 5.8159299
## metab_129 2.9355728
## metab_130 2.4297524
## metab_131 2.2460621
## metab_132 2.7378197
## metab_133 2.0136951
## metab_134 2.7562347
## metab_135 4.5031383
## metab_136 4.2744372
## metab_137 4.8392084
## metab_138 2.4677686
## metab_139 3.0501414
## metab_140 2.1809204
## metab_141 4.7253654
## metab_142 9.7346837
## metab_143 7.2110234
## metab_144 3.0357367
## metab_145 3.2383676
## metab_146 3.7965463
## metab_147 2.6759288
## metab_148 3.1408076
## metab_149 4.4879897
## metab_150 4.4928516
## metab_151 3.3962656
## metab_152 4.3480450
## metab_153 3.4106608
## metab_154 4.3054529
## metab_155 2.1063499
## metab_156 2.1672697
## metab_157 3.3043247
## metab_158 3.1195794
## metab_159 2.8363586
## metab_160 4.8829846
## metab_161 20.1865761
## metab_162 2.8148182
## metab_163 16.3606902
## metab_164 5.3775858
## metab_165 3.3140682
## metab_166 2.8479380
## metab_167 2.1778484
## metab_168 2.4356618
## metab_169 3.4685483
## metab_170 4.0539788
## metab_171 3.7686593
## metab_172 3.8229209
## metab_173 3.6857036
## metab_174 3.2291133
## metab_175 3.8641576
## metab_176 4.1128161
## metab_177 10.2704261
varImpPlot(rf_model)
set.seed(101)
gbm_model <- gbm(hs_zbmi_who ~ ., data = train_data,
distribution = "gaussian",
n.trees = 1000,
interaction.depth = 3,
n.minobsinnode = 10,
shrinkage = 0.01,
cv.folds = 5,
verbose = TRUE)
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.4850 nan 0.0100 0.0050
## 2 1.4785 nan 0.0100 0.0052
## 3 1.4726 nan 0.0100 0.0052
## 4 1.4661 nan 0.0100 0.0054
## 5 1.4594 nan 0.0100 0.0052
## 6 1.4545 nan 0.0100 0.0041
## 7 1.4497 nan 0.0100 0.0035
## 8 1.4443 nan 0.0100 0.0052
## 9 1.4379 nan 0.0100 0.0053
## 10 1.4327 nan 0.0100 0.0047
## 20 1.3835 nan 0.0100 0.0051
## 40 1.2970 nan 0.0100 0.0031
## 60 1.2267 nan 0.0100 0.0021
## 80 1.1648 nan 0.0100 0.0012
## 100 1.1124 nan 0.0100 0.0010
## 120 1.0654 nan 0.0100 0.0009
## 140 1.0223 nan 0.0100 0.0006
## 160 0.9821 nan 0.0100 -0.0001
## 180 0.9452 nan 0.0100 0.0003
## 200 0.9131 nan 0.0100 0.0009
## 220 0.8831 nan 0.0100 0.0007
## 240 0.8551 nan 0.0100 0.0007
## 260 0.8287 nan 0.0100 0.0001
## 280 0.8047 nan 0.0100 0.0005
## 300 0.7812 nan 0.0100 0.0002
## 320 0.7593 nan 0.0100 -0.0002
## 340 0.7381 nan 0.0100 0.0003
## 360 0.7191 nan 0.0100 -0.0001
## 380 0.7002 nan 0.0100 0.0001
## 400 0.6816 nan 0.0100 0.0010
## 420 0.6652 nan 0.0100 0.0001
## 440 0.6489 nan 0.0100 0.0003
## 460 0.6334 nan 0.0100 -0.0000
## 480 0.6189 nan 0.0100 0.0001
## 500 0.6053 nan 0.0100 -0.0002
## 520 0.5924 nan 0.0100 0.0001
## 540 0.5792 nan 0.0100 0.0000
## 560 0.5665 nan 0.0100 0.0001
## 580 0.5538 nan 0.0100 0.0002
## 600 0.5425 nan 0.0100 -0.0002
## 620 0.5308 nan 0.0100 0.0002
## 640 0.5198 nan 0.0100 0.0001
## 660 0.5101 nan 0.0100 -0.0001
## 680 0.4999 nan 0.0100 -0.0000
## 700 0.4900 nan 0.0100 -0.0001
## 720 0.4801 nan 0.0100 -0.0000
## 740 0.4715 nan 0.0100 0.0000
## 760 0.4632 nan 0.0100 -0.0004
## 780 0.4550 nan 0.0100 -0.0001
## 800 0.4472 nan 0.0100 -0.0002
## 820 0.4392 nan 0.0100 -0.0000
## 840 0.4314 nan 0.0100 -0.0001
## 860 0.4235 nan 0.0100 -0.0000
## 880 0.4162 nan 0.0100 -0.0001
## 900 0.4091 nan 0.0100 0.0000
## 920 0.4019 nan 0.0100 0.0001
## 940 0.3951 nan 0.0100 -0.0001
## 960 0.3891 nan 0.0100 -0.0002
## 980 0.3825 nan 0.0100 -0.0003
## 1000 0.3760 nan 0.0100 -0.0002
best_trees <- gbm.perf(gbm_model, method = "cv")
gbm_predictions <- predict(gbm_model, newdata = test_data, n.trees = best_trees)
gbm_mse <- mean((gbm_predictions - y_test)^2)
cat("GBM Mean Squared Error on Test Set:", gbm_mse, "\n")
## GBM Mean Squared Error on Test Set: 0.8785297
gbm_importance <- summary(gbm_model)
print(gbm_importance)
## var rel.inf
## hs_hcb_cadj_Log2 hs_hcb_cadj_Log2 8.09001922
## hs_pcb170_cadj_Log2 hs_pcb170_cadj_Log2 6.44638609
## metab_95 metab_95 4.70041113
## metab_161 metab_161 4.06808351
## metab_49 metab_49 3.88961561
## metab_8 metab_8 3.69537755
## h_cohort h_cohort 3.42481063
## hs_pbde153_cadj_Log2 hs_pbde153_cadj_Log2 2.87770134
## metab_163 metab_163 2.48868901
## hs_cu_c_Log2 hs_cu_c_Log2 2.30170839
## metab_30 metab_30 2.13973938
## metab_48 metab_48 2.00156103
## metab_142 metab_142 1.89265886
## hs_pcb180_cadj_Log2 hs_pcb180_cadj_Log2 1.57747783
## metab_160 metab_160 1.50826453
## metab_177 metab_177 1.47834807
## metab_42 metab_42 1.41028628
## metab_50 metab_50 1.23706726
## metab_6 metab_6 1.18821993
## metab_26 metab_26 1.04537234
## hs_pfoa_c_Log2 hs_pfoa_c_Log2 1.00358351
## metab_143 metab_143 0.91541695
## metab_47 metab_47 0.87612633
## h_bfdur_Ter h_bfdur_Ter 0.85727456
## metab_104 metab_104 0.82239196
## hs_mo_c_Log2 hs_mo_c_Log2 0.81088322
## metab_141 metab_141 0.80475855
## metab_94 metab_94 0.76664002
## metab_59 metab_59 0.75820581
## metab_128 metab_128 0.74311041
## metab_113 metab_113 0.71170889
## hs_pcb138_cadj_Log2 hs_pcb138_cadj_Log2 0.67929455
## metab_122 metab_122 0.66113535
## hs_dde_cadj_Log2 hs_dde_cadj_Log2 0.65406210
## metab_120 metab_120 0.63798282
## metab_82 metab_82 0.61547454
## metab_75 metab_75 0.60200645
## hs_pfos_c_Log2 hs_pfos_c_Log2 0.59579191
## hs_co_c_Log2 hs_co_c_Log2 0.55915743
## metab_117 metab_117 0.55344546
## hs_hg_c_Log2 hs_hg_c_Log2 0.55133300
## metab_110 metab_110 0.54866474
## metab_137 metab_137 0.49236959
## metab_78 metab_78 0.48947185
## metab_150 metab_150 0.46597858
## metab_44 metab_44 0.45591644
## metab_99 metab_99 0.45193551
## metab_127 metab_127 0.44022048
## metab_81 metab_81 0.43074687
## hs_mbzp_cadj_Log2 hs_mbzp_cadj_Log2 0.42872108
## hs_child_age_None hs_child_age_None 0.42168270
## metab_172 metab_172 0.41997221
## metab_31 metab_31 0.40523427
## metab_171 metab_171 0.39806209
## metab_54 metab_54 0.39617569
## metab_149 metab_149 0.38775627
## metab_7 metab_7 0.37659112
## hs_pcb153_cadj_Log2 hs_pcb153_cadj_Log2 0.37331000
## hs_bakery_prod_Ter hs_bakery_prod_Ter 0.37000119
## metab_62 metab_62 0.36123181
## metab_55 metab_55 0.35370110
## metab_14 metab_14 0.34628087
## metab_57 metab_57 0.34327352
## hs_oxominp_cadj_Log2 hs_oxominp_cadj_Log2 0.33105628
## metab_115 metab_115 0.33100201
## hs_pfhxs_c_Log2 hs_pfhxs_c_Log2 0.31821178
## metab_119 metab_119 0.30771037
## e3_sex_None e3_sex_None 0.30067894
## hs_detp_cadj_Log2 hs_detp_cadj_Log2 0.29525560
## hs_mepa_cadj_Log2 hs_mepa_cadj_Log2 0.27705232
## hs_pb_c_Log2 hs_pb_c_Log2 0.27509679
## metab_152 metab_152 0.24780912
## metab_96 metab_96 0.24249150
## metab_91 metab_91 0.24221864
## metab_136 metab_136 0.23660925
## e3_yearbir_None e3_yearbir_None 0.23524288
## metab_108 metab_108 0.23394912
## metab_138 metab_138 0.23111353
## metab_2 metab_2 0.23018862
## metab_103 metab_103 0.22954588
## hs_pbde47_cadj_Log2 hs_pbde47_cadj_Log2 0.22876232
## metab_146 metab_146 0.22606326
## metab_92 metab_92 0.22580396
## metab_37 metab_37 0.22268381
## metab_24 metab_24 0.20052952
## metab_40 metab_40 0.20022361
## metab_53 metab_53 0.18985083
## hs_cs_c_Log2 hs_cs_c_Log2 0.18823375
## metab_56 metab_56 0.18539169
## metab_130 metab_130 0.18086152
## metab_64 metab_64 0.17729134
## metab_60 metab_60 0.17623616
## hs_mn_c_Log2 hs_mn_c_Log2 0.17429784
## metab_170 metab_170 0.17254842
## metab_35 metab_35 0.17189518
## metab_27 metab_27 0.16900934
## hs_meohp_cadj_Log2 hs_meohp_cadj_Log2 0.16717111
## metab_121 metab_121 0.15944674
## metab_71 metab_71 0.15868009
## metab_4 metab_4 0.15860609
## metab_105 metab_105 0.15837794
## hs_etpa_cadj_Log2 hs_etpa_cadj_Log2 0.15720629
## hs_trcs_cadj_Log2 hs_trcs_cadj_Log2 0.15549514
## metab_164 metab_164 0.15078827
## metab_144 metab_144 0.14991119
## metab_114 metab_114 0.14960583
## FAS_cat_None FAS_cat_None 0.14815062
## metab_85 metab_85 0.14813786
## metab_15 metab_15 0.14684638
## metab_29 metab_29 0.14638264
## metab_41 metab_41 0.14220298
## metab_100 metab_100 0.14150099
## metab_109 metab_109 0.14123575
## metab_123 metab_123 0.13878638
## metab_23 metab_23 0.13759614
## metab_33 metab_33 0.13679938
## metab_77 metab_77 0.13511750
## metab_165 metab_165 0.13341858
## metab_133 metab_133 0.13302208
## metab_107 metab_107 0.13201222
## metab_153 metab_153 0.13087818
## metab_151 metab_151 0.12947346
## metab_12 metab_12 0.12694644
## hs_readymade_Ter hs_readymade_Ter 0.12570733
## metab_116 metab_116 0.12512748
## metab_129 metab_129 0.12260370
## metab_10 metab_10 0.12235663
## metab_176 metab_176 0.12175295
## metab_135 metab_135 0.12167333
## metab_46 metab_46 0.11972652
## metab_83 metab_83 0.11938083
## metab_51 metab_51 0.11378695
## metab_97 metab_97 0.11237984
## hs_dep_cadj_Log2 hs_dep_cadj_Log2 0.10984940
## hs_participation_3cat_None hs_participation_3cat_None 0.10886221
## metab_132 metab_132 0.10846069
## hs_ohminp_cadj_Log2 hs_ohminp_cadj_Log2 0.10642128
## metab_131 metab_131 0.10629006
## hs_oxbe_cadj_Log2 hs_oxbe_cadj_Log2 0.10534883
## metab_45 metab_45 0.10437452
## metab_52 metab_52 0.10178562
## metab_154 metab_154 0.10111120
## metab_159 metab_159 0.09952848
## metab_93 metab_93 0.09611185
## metab_43 metab_43 0.09378788
## metab_157 metab_157 0.09275885
## metab_145 metab_145 0.09247463
## hs_bupa_cadj_Log2 hs_bupa_cadj_Log2 0.08962850
## metab_3 metab_3 0.08882494
## metab_5 metab_5 0.08845575
## metab_166 metab_166 0.08839111
## metab_25 metab_25 0.08833395
## metab_175 metab_175 0.08676896
## metab_88 metab_88 0.08665773
## metab_167 metab_167 0.08661254
## metab_36 metab_36 0.08642973
## metab_134 metab_134 0.08463781
## hs_bpa_cadj_Log2 hs_bpa_cadj_Log2 0.08277293
## metab_79 metab_79 0.08274879
## metab_173 metab_173 0.08225466
## metab_11 metab_11 0.08131778
## metab_125 metab_125 0.08115445
## metab_118 metab_118 0.07940961
## hs_cd_c_Log2 hs_cd_c_Log2 0.07835982
## metab_68 metab_68 0.07536753
## hs_dairy_Ter hs_dairy_Ter 0.07190689
## metab_9 metab_9 0.07127219
## metab_124 metab_124 0.06968439
## metab_147 metab_147 0.06865321
## hs_ddt_cadj_Log2 hs_ddt_cadj_Log2 0.06849555
## metab_86 metab_86 0.06674128
## metab_1 metab_1 0.06568159
## metab_162 metab_162 0.06556168
## metab_20 metab_20 0.06369587
## h_native_None h_native_None 0.06291201
## hs_mep_cadj_Log2 hs_mep_cadj_Log2 0.06136428
## metab_28 metab_28 0.06096310
## metab_139 metab_139 0.05559421
## hs_org_food_Ter hs_org_food_Ter 0.05500296
## hs_as_c_Log2 hs_as_c_Log2 0.05361163
## metab_21 metab_21 0.05217361
## metab_158 metab_158 0.05167365
## metab_63 metab_63 0.05131268
## metab_76 metab_76 0.05046016
## metab_74 metab_74 0.04969322
## metab_174 metab_174 0.04879932
## metab_66 metab_66 0.04869034
## metab_67 metab_67 0.04786547
## metab_168 metab_168 0.04710700
## hs_total_fruits_Ter hs_total_fruits_Ter 0.04695288
## metab_22 metab_22 0.04591996
## metab_111 metab_111 0.04560449
## metab_84 metab_84 0.04368034
## hs_total_lipids_Ter hs_total_lipids_Ter 0.04272534
## metab_98 metab_98 0.04213272
## hs_mnbp_cadj_Log2 hs_mnbp_cadj_Log2 0.03942191
## hs_prpa_cadj_Log2 hs_prpa_cadj_Log2 0.03874694
## metab_38 metab_38 0.03743543
## metab_65 metab_65 0.03678179
## metab_155 metab_155 0.03362268
## metab_72 metab_72 0.03314292
## metab_80 metab_80 0.03149383
## hs_dmp_cadj_Log2 hs_dmp_cadj_Log2 0.02884955
## metab_70 metab_70 0.02849324
## metab_61 metab_61 0.02784492
## hs_pfunda_c_Log2 hs_pfunda_c_Log2 0.02726959
## hs_hm_pers_None hs_hm_pers_None 0.02671761
## metab_39 metab_39 0.02586278
## hs_total_fish_Ter hs_total_fish_Ter 0.02483172
## metab_169 metab_169 0.02315660
## metab_148 metab_148 0.02288396
## hs_beverages_Ter hs_beverages_Ter 0.02261617
## metab_89 metab_89 0.02247596
## hs_mehhp_cadj_Log2 hs_mehhp_cadj_Log2 0.02224522
## hs_total_yog_Ter hs_total_yog_Ter 0.02132102
## hs_mehp_cadj_Log2 hs_mehp_cadj_Log2 0.02082878
## metab_16 metab_16 0.02037888
## metab_32 metab_32 0.01927902
## metab_73 metab_73 0.01550597
## hs_total_veg_Ter hs_total_veg_Ter 0.01526521
## hs_mibp_cadj_Log2 hs_mibp_cadj_Log2 0.01493790
## hs_dmtp_cadj_Log2 hs_dmtp_cadj_Log2 0.01381684
## hs_pfna_c_Log2 hs_pfna_c_Log2 0.01380861
## metab_69 metab_69 0.01317794
## metab_140 metab_140 0.01292028
## metab_126 metab_126 0.01290988
## hs_total_sweets_Ter hs_total_sweets_Ter 0.01248655
## metab_34 metab_34 0.01229531
## hs_total_potatoes_Ter hs_total_potatoes_Ter 0.01222084
## metab_18 metab_18 0.01155753
## metab_112 metab_112 0.01008567
## h_edumc_None h_edumc_None 0.00000000
## hs_tl_cdich_None hs_tl_cdich_None 0.00000000
## hs_pcb118_cadj_Log2 hs_pcb118_cadj_Log2 0.00000000
## hs_dmdtp_cdich_None hs_dmdtp_cdich_None 0.00000000
## hs_mecpp_cadj_Log2 hs_mecpp_cadj_Log2 0.00000000
## hs_contactfam_3cat_num_None hs_contactfam_3cat_num_None 0.00000000
## hs_cotinine_cdich_None hs_cotinine_cdich_None 0.00000000
## hs_globalexp2_None hs_globalexp2_None 0.00000000
## hs_smk_parents_None hs_smk_parents_None 0.00000000
## hs_break_cer_Ter hs_break_cer_Ter 0.00000000
## hs_caff_drink_Ter hs_caff_drink_Ter 0.00000000
## hs_fastfood_Ter hs_fastfood_Ter 0.00000000
## h_legume_preg_Ter h_legume_preg_Ter 0.00000000
## hs_proc_meat_Ter hs_proc_meat_Ter 0.00000000
## hs_total_bread_Ter hs_total_bread_Ter 0.00000000
## hs_total_cereal_Ter hs_total_cereal_Ter 0.00000000
## hs_total_meat_Ter hs_total_meat_Ter 0.00000000
## metab_13 metab_13 0.00000000
## metab_17 metab_17 0.00000000
## metab_19 metab_19 0.00000000
## metab_58 metab_58 0.00000000
## metab_87 metab_87 0.00000000
## metab_90 metab_90 0.00000000
## metab_101 metab_101 0.00000000
## metab_102 metab_102 0.00000000
## metab_106 metab_106 0.00000000
## metab_156 metab_156 0.00000000
# combined data is clean and remove rows with NA values may have issues
combined_data <- combined_data %>% na.omit()
#hs_zbmi_who to binary based on median?? is that going to mess it up?
median_value <- median(combined_data$hs_zbmi_who, na.rm = TRUE)
combined_data$hs_zbmi_who_binary <- ifelse(combined_data$hs_zbmi_who > median_value, 1, 0)
set.seed(101)
trainIndex <- createDataPartition(combined_data$hs_zbmi_who_binary, p = .7, list = FALSE, times = 1)
train_data <- combined_data[trainIndex,]
test_data <- combined_data[-trainIndex,]
# clean rows with NAs in any predictors from train_data—may be problematic
train_data_clean <- train_data[complete.cases(train_data), ]
x_train <- model.matrix(hs_zbmi_who_binary ~ . - ID, data = train_data_clean)[,-1]
y_train <- as.numeric(train_data_clean$hs_zbmi_who_binary)
test_data_clean <- test_data[complete.cases(test_data), ]
x_test <- model.matrix(hs_zbmi_who_binary ~ . - ID, data = test_data_clean)[,-1]
y_test <- as.numeric(test_data_clean$hs_zbmi_who_binary)
# make sure dimensions match
#dim(x_train): 840 295
#length(y_train): 840
#dim(x_test): 358 295
#length(y_test): 358
num_chemicals <- length(chemicals_full)
num_diet <- length(postnatal_diet)
num_metabolomics <- ncol(metabol_serum_transposed) - 1 # Excluding ID
num_covariates <- ncol(outcome_and_cov) - 2 # Excluding ID and outcome
# combine all the lengths
total_length <- num_chemicals + num_diet + num_metabolomics + num_covariates
# define groups
group_indices <- c(
rep(1, num_chemicals), # Group 1: Chemicals
rep(2, num_diet), # Group 2: Postnatal diet
rep(3, num_metabolomics), # Group 3: Metabolomics (excluding ID)
rep(4, num_covariates) # Group 4: Covariates (excluding ID and outcome)
)
# make sure length of group_indices matches x_train
length(group_indices) == ncol(x_train)
## [1] FALSE
# adjust length if necessary
if (length(group_indices) < ncol(x_train)) {
group_indices <- c(group_indices, rep(5, ncol(x_train) - length(group_indices)))
}
# making sure length of group_indices matches the number of columns in x_train
length(group_indices) == ncol(x_train)
## [1] TRUE
# fit the Group LASSO model for logistic regression
group_lasso_model <- grplasso(x_train, y_train, index = group_indices, lambda = 0.1, model = LogReg())
## Couldn't find intercept. Setting center = FALSE.
## Lambda: 0.1 nr.var: 288
group_lasso_coef <- coef(group_lasso_model)
print(group_lasso_coef)
## 0.1
## hs_child_age_None -0.428247931
## h_cohort2 3.790974647
## h_cohort3 1.458305858
## h_cohort4 -2.248105797
## h_cohort5 3.008613818
## h_cohort6 -2.180512136
## e3_sex_Nonemale -0.004781578
## e3_yearbir_None2004 -0.818194783
## e3_yearbir_None2005 0.724572411
## e3_yearbir_None2006 1.841006426
## e3_yearbir_None2007 4.368309997
## e3_yearbir_None2008 3.647524519
## e3_yearbir_None2009 2.760468960
## h_edumc_None2 -0.013269380
## h_edumc_None3 0.635238844
## h_native_None1 2.307689630
## h_native_None2 1.624958848
## hs_zbmi_who 11.799243602
## hs_as_c_Log2 0.124009339
## hs_cd_c_Log2 -0.057903007
## hs_co_c_Log2 0.331480477
## hs_cs_c_Log2 -0.482528292
## hs_cu_c_Log2 -0.549117367
## hs_hg_c_Log2 -0.180919382
## hs_mn_c_Log2 -0.894208992
## hs_mo_c_Log2 0.607398729
## hs_pb_c_Log2 0.944352299
## hs_tl_cdich_NoneUndetected -1.247263826
## hs_dde_cadj_Log2 0.215447852
## hs_ddt_cadj_Log2 0.029158124
## hs_hcb_cadj_Log2 0.086441350
## hs_pcb118_cadj_Log2 -0.179176592
## hs_pcb138_cadj_Log2 -0.090198308
## hs_pcb153_cadj_Log2 -0.066189710
## hs_pcb170_cadj_Log2 -0.014665263
## hs_pcb180_cadj_Log2 0.136217359
## hs_dep_cadj_Log2 -0.191773179
## hs_detp_cadj_Log2 -0.067699450
## hs_dmdtp_cdich_NoneUndetected -0.518626751
## hs_dmp_cadj_Log2 -0.047527022
## hs_dmtp_cadj_Log2 -0.033610375
## hs_pbde153_cadj_Log2 -0.171989622
## hs_pbde47_cadj_Log2 -0.267047646
## hs_pfhxs_c_Log2 0.303379975
## hs_pfna_c_Log2 -0.339576729
## hs_pfoa_c_Log2 -0.504836925
## hs_pfos_c_Log2 0.375078346
## hs_pfunda_c_Log2 0.107636563
## hs_bpa_cadj_Log2 -0.025852450
## hs_bupa_cadj_Log2 -0.098893018
## hs_etpa_cadj_Log2 0.146563081
## hs_mepa_cadj_Log2 -0.028687642
## hs_oxbe_cadj_Log2 0.154362202
## hs_prpa_cadj_Log2 -0.054510955
## hs_trcs_cadj_Log2 0.070675068
## hs_mbzp_cadj_Log2 -0.058018944
## hs_mecpp_cadj_Log2 -0.064497305
## hs_mehhp_cadj_Log2 -0.268510271
## hs_mehp_cadj_Log2 0.002647427
## hs_meohp_cadj_Log2 0.287547932
## hs_mep_cadj_Log2 0.090121251
## hs_mibp_cadj_Log2 -0.206613987
## hs_mnbp_cadj_Log2 -0.101177130
## hs_ohminp_cadj_Log2 0.050695954
## hs_oxominp_cadj_Log2 0.093567426
## FAS_cat_NoneMiddle 0.104943424
## FAS_cat_NoneHigh -0.109557204
## hs_contactfam_3cat_num_NoneOnce a week 0.070483063
## hs_contactfam_3cat_num_NoneLess than once a week -0.212496352
## hs_hm_pers_None 0.342394058
## hs_participation_3cat_None1 organisation 0.180463327
## hs_participation_3cat_None2 or more organisations -0.033609189
## hs_cotinine_cdich_NoneUndetected 0.212382269
## hs_globalexp2_Noneno exposure -0.254942577
## hs_smk_parents_Noneneither -0.057549801
## hs_smk_parents_Noneone 0.595686720
## h_bfdur_Ter(10.8,34.9] 0.727588620
## h_bfdur_Ter(34.9,Inf] 0.081839533
## hs_bakery_prod_Ter(2,6] -0.935890179
## hs_bakery_prod_Ter(6,Inf] -1.323758145
## hs_beverages_Ter(0.132,1] -0.054845401
## hs_beverages_Ter(1,Inf] -0.579006966
## hs_break_cer_Ter(1.1,5.5] -0.098106198
## hs_break_cer_Ter(5.5,Inf] -0.065506925
## hs_caff_drink_Ter(0.132,Inf] -0.782537011
## hs_dairy_Ter(14.6,25.6] 0.368189323
## hs_dairy_Ter(25.6,Inf] -0.429066037
## hs_fastfood_Ter(0.132,0.5] -0.143049164
## hs_fastfood_Ter(0.5,Inf] -0.146808473
## h_legume_preg_Ter(0.5,2] 1.834427047
## h_legume_preg_Ter(2,Inf] 1.633271549
## hs_org_food_Ter(0.132,1] 0.412353623
## hs_org_food_Ter(1,Inf] 0.979063389
## hs_proc_meat_Ter(1.5,4] -0.717534174
## hs_proc_meat_Ter(4,Inf] -1.356799509
## hs_readymade_Ter(0.132,0.5] 0.097849843
## hs_readymade_Ter(0.5,Inf] 0.476693963
## hs_total_bread_Ter(7,17.5] 0.022533599
## hs_total_bread_Ter(17.5,Inf] 1.157117521
## hs_total_cereal_Ter(14.1,23.6] 0.230583442
## hs_total_cereal_Ter(23.6,Inf] -0.624710144
## hs_total_fish_Ter(1.5,3] 0.085310903
## hs_total_fish_Ter(3,Inf] 0.463636214
## hs_total_fruits_Ter(7,14.1] 0.424365903
## hs_total_fruits_Ter(14.1,Inf] 0.767531408
## hs_total_lipids_Ter(3,7] 0.198234122
## hs_total_lipids_Ter(7,Inf] 0.106417238
## hs_total_meat_Ter(6,9] 0.724704590
## hs_total_meat_Ter(9,Inf] 1.801329422
## hs_total_potatoes_Ter(3,4] -0.300293489
## hs_total_potatoes_Ter(4,Inf] -1.081222107
## hs_total_sweets_Ter(4.1,8.5] 0.412272261
## hs_total_sweets_Ter(8.5,Inf] 0.775681553
## hs_total_veg_Ter(6,8.5] 0.148264292
## hs_total_veg_Ter(8.5,Inf] -0.696745189
## hs_total_yog_Ter(6,8.5] -0.013621096
## hs_total_yog_Ter(8.5,Inf] 0.640265151
## metab_1 -0.067031140
## metab_2 -1.655819593
## metab_3 -0.273971293
## metab_4 0.132220924
## metab_5 0.865827052
## metab_6 0.905556903
## metab_7 0.008365954
## metab_8 1.213107849
## metab_9 -3.575054355
## metab_10 1.275958668
## metab_11 -0.993075817
## metab_12 2.594204881
## metab_13 -0.083093574
## metab_14 1.522201743
## metab_15 0.019896756
## metab_16 1.639478636
## metab_17 -0.858710242
## metab_18 -0.538093627
## metab_19 -1.333548959
## metab_20 -2.524130686
## metab_21 1.282190525
## metab_22 -1.535239831
## metab_23 -2.586707766
## metab_24 0.493552776
## metab_25 1.924037236
## metab_26 2.903541687
## metab_27 -0.583375912
## metab_28 1.200797272
## metab_29 -0.588684374
## metab_30 -0.412047078
## metab_31 -0.116690829
## metab_32 -0.263755586
## metab_33 0.090423284
## metab_34 -1.943411179
## metab_35 -0.954971999
## metab_36 5.045034286
## metab_37 1.822579794
## metab_38 -0.646848289
## metab_39 0.822713931
## metab_40 -2.059808544
## metab_41 2.048957266
## metab_42 -2.726083485
## metab_43 0.251630242
## metab_44 0.916406913
## metab_45 -0.291572424
## metab_46 1.654466207
## metab_47 -0.167797962
## metab_48 -0.110777249
## metab_49 0.670061011
## metab_50 -0.738202700
## metab_51 0.048289210
## metab_52 -0.180689195
## metab_53 -1.747625359
## metab_54 1.406413549
## metab_55 3.037221875
## metab_56 1.811169086
## metab_57 -1.840943182
## metab_58 1.538750312
## metab_59 -4.453457059
## metab_60 4.734780565
## metab_61 -4.268573722
## metab_62 5.749260579
## metab_63 -0.253071542
## metab_64 -2.256780019
## metab_65 0.621121266
## metab_66 -0.786585874
## metab_67 -2.695592718
## metab_68 1.878148966
## metab_69 1.438360121
## metab_70 1.364251715
## metab_71 -1.615701925
## metab_72 -0.297945615
## metab_73 -1.431107728
## metab_74 -0.613915714
## metab_75 0.490777973
## metab_76 0.504689922
## metab_77 -0.076320279
## metab_78 5.072449934
## metab_79 3.255662836
## metab_80 4.759582884
## metab_81 0.268066869
## metab_82 -7.374042204
## metab_83 -1.880452995
## metab_84 1.445972717
## metab_85 -7.959633012
## metab_86 -0.554923746
## metab_87 0.941531429
## metab_88 0.083118596
## metab_89 -1.811446458
## metab_90 6.151949662
## metab_91 -0.913610544
## metab_92 4.109353456
## metab_93 -1.628904530
## metab_94 0.082147896
## metab_95 2.592016036
## metab_96 -1.674951528
## metab_97 1.517320418
## metab_98 -6.345024907
## metab_99 1.297288771
## metab_100 -1.305066455
## metab_101 -2.648713687
## metab_102 2.811131422
## metab_103 -1.456128528
## metab_104 5.147928463
## metab_105 3.147324364
## metab_106 -3.560162447
## metab_107 0.788594696
## metab_108 4.091843027
## metab_109 -0.972439955
## metab_110 -1.152481773
## metab_111 -4.683154456
## metab_112 -0.461506613
## metab_113 3.098673480
## metab_114 5.321961440
## metab_115 0.138078082
## metab_116 -3.341876094
## metab_117 -1.084178193
## metab_118 -7.991025369
## metab_119 -4.373225414
## metab_120 1.917509626
## metab_121 -1.469831817
## metab_122 2.647977382
## metab_123 9.438136749
## metab_124 4.614616778
## metab_125 1.331638393
## metab_126 0.373219350
## metab_127 -0.064357832
## metab_128 -1.487581681
## metab_129 -1.364206964
## metab_130 -2.949326462
## metab_131 -5.322605966
## metab_132 -2.354006995
## metab_133 -2.425399130
## metab_134 1.661239622
## metab_135 0.614080602
## metab_136 0.000000000
## metab_137 0.000000000
## metab_138 0.000000000
## metab_139 0.000000000
## metab_140 0.000000000
## metab_141 0.000000000
## metab_142 -0.203342309
## metab_143 0.447773357
## metab_144 1.546917145
## metab_145 -0.801748165
## metab_146 -0.277619944
## metab_147 -2.009966086
## metab_148 0.933895096
## metab_149 -1.095851618
## metab_150 0.342836081
## metab_151 0.022699894
## metab_152 -0.207427887
## metab_153 0.002242071
## metab_154 0.174028993
## metab_155 -3.126076778
## metab_156 3.695486519
## metab_157 1.247748689
## metab_158 -0.199469726
## metab_159 -1.271217640
## metab_160 -5.295961773
## metab_161 7.037667008
## metab_162 4.622378628
## metab_163 -6.574688834
## metab_164 0.293544923
## metab_165 1.653481716
## metab_166 -1.628611594
## metab_167 -0.349724282
## metab_168 0.326978664
## metab_169 -0.106594346
## metab_170 -0.144533643
## metab_171 -0.041223024
## metab_172 -0.262396984
## metab_173 1.093523350
## metab_174 0.724776147
## metab_175 0.008028654
## metab_176 -0.180030926
## metab_177 0.687833999
group_lasso_predictions <- predict(group_lasso_model, newdata = x_test, type = "response")
# convert probabilities to binary predictions
binary_predictions <- ifelse(group_lasso_predictions > 0.5, 1, 0)
accuracy <- mean(binary_predictions == y_test)
cat("Group LASSO Accuracy on Test Set:", accuracy, "\n")
## Group LASSO Accuracy on Test Set: 0.9162011
conf_matrix <- confusionMatrix(factor(binary_predictions), factor(y_test))
conf_matrix
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 168 19
## 1 11 160
##
## Accuracy : 0.9162
## 95% CI : (0.8825, 0.9427)
## No Information Rate : 0.5
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.8324
##
## Mcnemar's Test P-Value : 0.2012
##
## Sensitivity : 0.9385
## Specificity : 0.8939
## Pos Pred Value : 0.8984
## Neg Pred Value : 0.9357
## Prevalence : 0.5000
## Detection Rate : 0.4693
## Detection Prevalence : 0.5223
## Balanced Accuracy : 0.9162
##
## 'Positive' Class : 0
##
# ROC Curve and AUC
roc_curve <- roc(y_test, group_lasso_predictions)
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
plot(roc_curve, main = "ROC Curve for Group LASSO Model (without metabolomics)")
auc_value <- auc(roc_curve)
cat("Group LASSO AUC on Test Set:", auc_value, "\n")
## Group LASSO AUC on Test Set: 0.9803377
selected_data <- selected_data %>% na.omit()
median_value <- median(selected_data$hs_zbmi_who, na.rm = TRUE)
selected_data$hs_zbmi_who_binary <- ifelse(selected_data$hs_zbmi_who > median_value, 1, 0)
set.seed(101)
trainIndex <- createDataPartition(selected_data$hs_zbmi_who_binary, p = .7, list = FALSE, times = 1)
train_data <- selected_data[trainIndex,]
test_data <- selected_data[-trainIndex,]
train_data_clean <- train_data[complete.cases(train_data), ]
x_train <- model.matrix(hs_zbmi_who_binary ~ . - hs_zbmi_who, data = train_data_clean)[,-1]
y_train <- as.numeric(train_data_clean$hs_zbmi_who_binary)
test_data_clean <- test_data[complete.cases(test_data), ]
x_test <- model.matrix(hs_zbmi_who_binary ~ . - hs_zbmi_who, data = test_data_clean)[,-1]
y_test <- as.numeric(test_data_clean$hs_zbmi_who_binary)
num_chemicals <- length(chemicals_full)
num_diet <- length(postnatal_diet)
num_covariates <- ncol(outcome_and_cov) - 2 # excluding outcome and binary outcome
total_length <- num_chemicals + num_diet + num_covariates
group_indices <- c(
rep(1, num_chemicals), # Group 1: Chemicals
rep(2, num_diet), # Group 2: Postnatal diet
rep(3, num_covariates) # Group 3: Covariates (excluding outcome)
)
length(group_indices) == ncol(x_train)
## [1] FALSE
# Adjust length if necessary
if (length(group_indices) < ncol(x_train)) {
group_indices <- c(group_indices, rep(4, ncol(x_train) - length(group_indices)))
}
length(group_indices) == ncol(x_train)
## [1] TRUE
group_lasso_model <- grplasso(x_train, y_train, index = group_indices, lambda = 0.1, model = LogReg())
## Couldn't find intercept. Setting center = FALSE.
## Lambda: 0.1 nr.var: 116
group_lasso_coef <- coef(group_lasso_model)
print(group_lasso_coef)
## 0.1
## hs_child_age_None -0.356670389
## h_cohort2 2.080082445
## h_cohort3 2.244021749
## h_cohort4 1.877777762
## h_cohort5 1.107982852
## h_cohort6 0.810120894
## e3_sex_Nonemale 0.261309697
## e3_yearbir_None2004 -0.348262507
## e3_yearbir_None2005 -0.013627882
## e3_yearbir_None2006 0.199800451
## e3_yearbir_None2007 0.475555150
## e3_yearbir_None2008 0.572926802
## e3_yearbir_None2009 1.283645336
## h_edumc_None2 0.487064624
## h_edumc_None3 0.428197818
## h_native_None1 0.303179226
## h_native_None2 -0.026867597
## hs_as_c_Log2 0.047859058
## hs_cd_c_Log2 -0.054729103
## hs_co_c_Log2 -0.045209055
## hs_cs_c_Log2 0.424788719
## hs_cu_c_Log2 0.715611667
## hs_hg_c_Log2 -0.022150649
## hs_mn_c_Log2 -0.188426591
## hs_mo_c_Log2 -0.236095624
## hs_pb_c_Log2 -0.208347348
## hs_tl_cdich_NoneUndetected -0.185692667
## hs_dde_cadj_Log2 -0.126775335
## hs_ddt_cadj_Log2 0.011828913
## hs_hcb_cadj_Log2 -0.172818606
## hs_pcb118_cadj_Log2 0.398541987
## hs_pcb138_cadj_Log2 -0.531672249
## hs_pcb153_cadj_Log2 -0.564358218
## hs_pcb170_cadj_Log2 -0.110859985
## hs_pcb180_cadj_Log2 0.082786761
## hs_dep_cadj_Log2 -0.064805157
## hs_detp_cadj_Log2 0.028544607
## hs_dmdtp_cdich_NoneUndetected 0.020769293
## hs_dmp_cadj_Log2 0.001881927
## hs_dmtp_cadj_Log2 -0.009554098
## hs_pbde153_cadj_Log2 -0.067721108
## hs_pbde47_cadj_Log2 0.035527828
## hs_pfhxs_c_Log2 0.148702463
## hs_pfna_c_Log2 -0.063861939
## hs_pfoa_c_Log2 -0.370342540
## hs_pfos_c_Log2 -0.014209659
## hs_pfunda_c_Log2 0.085625611
## hs_bpa_cadj_Log2 0.025130251
## hs_bupa_cadj_Log2 -0.025269170
## hs_etpa_cadj_Log2 0.012228006
## hs_mepa_cadj_Log2 -0.049914233
## hs_oxbe_cadj_Log2 0.039232139
## hs_prpa_cadj_Log2 -0.004796878
## hs_trcs_cadj_Log2 0.053379535
## hs_mbzp_cadj_Log2 0.199557578
## hs_mecpp_cadj_Log2 -0.024297118
## hs_mehhp_cadj_Log2 -0.080600345
## hs_mehp_cadj_Log2 -0.036747026
## hs_meohp_cadj_Log2 0.039089402
## hs_mep_cadj_Log2 0.081237526
## hs_mibp_cadj_Log2 -0.082520739
## hs_mnbp_cadj_Log2 -0.154000014
## hs_ohminp_cadj_Log2 -0.168640597
## hs_oxominp_cadj_Log2 0.204393694
## FAS_cat_NoneMiddle 0.272770735
## FAS_cat_NoneHigh 0.308266250
## hs_contactfam_3cat_num_NoneOnce a week -0.102710129
## hs_contactfam_3cat_num_NoneLess than once a week 0.367772506
## hs_hm_pers_None 0.189639644
## hs_participation_3cat_None1 organisation -0.335643330
## hs_participation_3cat_None2 or more organisations 0.439534540
## hs_cotinine_cdich_NoneUndetected -0.287539701
## hs_globalexp2_Noneno exposure -0.251656243
## hs_smk_parents_Noneneither -0.222689644
## hs_smk_parents_Noneone -0.130069597
## h_bfdur_Ter(10.8,34.9] 0.169982955
## h_bfdur_Ter(34.9,Inf] 0.490400946
## hs_bakery_prod_Ter(2,6] -0.437361137
## hs_bakery_prod_Ter(6,Inf] -0.715997334
## hs_beverages_Ter(0.132,1] -0.166902018
## hs_beverages_Ter(1,Inf] -0.251371726
## hs_break_cer_Ter(1.1,5.5] 0.081253442
## hs_break_cer_Ter(5.5,Inf] -0.164224017
## hs_caff_drink_Ter(0.132,Inf] 0.089548240
## hs_dairy_Ter(14.6,25.6] 0.114768437
## hs_dairy_Ter(25.6,Inf] -0.066498478
## hs_fastfood_Ter(0.132,0.5] -0.016244064
## hs_fastfood_Ter(0.5,Inf] -0.106404095
## h_legume_preg_Ter(0.5,2] -0.426554164
## h_legume_preg_Ter(2,Inf] -0.310670453
## hs_org_food_Ter(0.132,1] 0.152885149
## hs_org_food_Ter(1,Inf] 0.087895855
## hs_proc_meat_Ter(1.5,4] -0.236516197
## hs_proc_meat_Ter(4,Inf] -0.146392530
## hs_readymade_Ter(0.132,0.5] 0.114545688
## hs_readymade_Ter(0.5,Inf] 0.116905445
## hs_total_bread_Ter(7,17.5] -0.291401746
## hs_total_bread_Ter(17.5,Inf] -0.483050307
## hs_total_cereal_Ter(14.1,23.6] 0.168000362
## hs_total_cereal_Ter(23.6,Inf] 0.420439013
## hs_total_fish_Ter(1.5,3] -0.028040986
## hs_total_fish_Ter(3,Inf] 0.200730621
## hs_total_fruits_Ter(7,14.1] 0.217953086
## hs_total_fruits_Ter(14.1,Inf] 0.240356047
## hs_total_lipids_Ter(3,7] -0.231835871
## hs_total_lipids_Ter(7,Inf] -0.282130710
## hs_total_meat_Ter(6,9] -0.022338528
## hs_total_meat_Ter(9,Inf] -0.168235872
## hs_total_potatoes_Ter(3,4] -0.041096274
## hs_total_potatoes_Ter(4,Inf] 0.072095459
## hs_total_sweets_Ter(4.1,8.5] -0.169634345
## hs_total_sweets_Ter(8.5,Inf] 0.058834787
## hs_total_veg_Ter(6,8.5] 0.154459067
## hs_total_veg_Ter(8.5,Inf] -0.127902981
## hs_total_yog_Ter(6,8.5] -0.073834939
## hs_total_yog_Ter(8.5,Inf] -0.078514424
group_lasso_predictions <- predict(group_lasso_model, newdata = x_test, type = "response")
binary_predictions <- ifelse(group_lasso_predictions > 0.5, 1, 0)
accuracy <- mean(binary_predictions == y_test)
cat("Group LASSO Accuracy on Test Set:", accuracy, "\n")
## Group LASSO Accuracy on Test Set: 0.6205128
conf_matrix <- confusionMatrix(factor(binary_predictions), factor(y_test))
conf_matrix
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 125 84
## 1 64 117
##
## Accuracy : 0.6205
## 95% CI : (0.5703, 0.6689)
## No Information Rate : 0.5154
## P-Value [Acc > NIR] : 1.845e-05
##
## Kappa : 0.2427
##
## Mcnemar's Test P-Value : 0.1183
##
## Sensitivity : 0.6614
## Specificity : 0.5821
## Pos Pred Value : 0.5981
## Neg Pred Value : 0.6464
## Prevalence : 0.4846
## Detection Rate : 0.3205
## Detection Prevalence : 0.5359
## Balanced Accuracy : 0.6217
##
## 'Positive' Class : 0
##
roc_curve <- roc(y_test, group_lasso_predictions)
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
plot(roc_curve, main = "ROC Curve for Group LASSO Model (without metabolomics)")
auc_value <- auc(roc_curve)
cat("Group LASSO AUC on Test Set:", auc_value, "\n")
## Group LASSO AUC on Test Set: 0.6899102